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CHAPTER 1 


Matrices 


Erftlich wird alles dasjenige eine Grdge genennt, 
weldes einer Vermehrung oder einer Verminderung fabig ift, 
oder wozu fid) nod) etwas hingufesen oder davon wegnehmen laft. 


—Leonhard Euler! 


Matrices play a central role in this book. They form an important part of the theory, and 
many concrete examples are based on them. Therefore it is essential to develop facility in 
matrix manipulation. Since matrices pervade mathematics, the techniques you will need are 
sure to be useful elsewhere. 


1.1 THE BASIC OPERATIONS 


Let m and n be positive integers. An m Xn matrix is a collection of mn numbers arranged 
in a rectangular array 


n columns 
ayy" | Ain 
(1.1.1) m TOWS : 
Ami °** GAmn 


2a 1540 
ie 2 
a symbol such as A to denote a matrix. 

The numbers in a matrix are the matrix entries. They may be denoted by a;;, where i 
and j are indices (integers) with 1 < i < m and 1 < j <n, the index 1 is the row index, and 
j is the column index. So aj; is the entry that appears in the 7th row and jth column of the 
matrix: 


For example, is a2 X3 matrix (two rows and three columns). We usually introduce 


I This is the opening sentence of Euler’s book Algebra, which was published in St. Petersburg in 1770. 
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In the above example, aj; = 2, a;3 = 0, and aj3 = 5. We sometimes denote the matrix 
whose entries are a; ; by (a;;). 

AnnXn matrix is called a square matrix. A 1X1 matrix [a] contains a single number, 
and we do not distinguish such a matrix from its entry. 

A 1Xn matrix is an n-dimensional row vector. We drop the index i when m = 1 and 
write a row vector as 


[aj --: ay\, Ob aS (4y,..., an): 
Commas in such a row vector are optional. Similarly, an m <1 matrix is an 


m-dimensional column vector: 
by 
bm 


In most of this book, we won’t make a distinction between an n-dimensional column vector 
and the point of n-dimensional space with the same coordinates. In the few places where the 
distinction is useful, we will state this Clearly. 


Addition of matrices is defined in the same way as vector addition. Let A = (a; ;) and 
B = (b;;) be two m Xn matrices. Their sum A + B is the m Xn matrix S = (s;;) defined by 


Sij = Ajj + bj ;. 


ad bod gta ake 
ie Fe 4-34 | ~ 15 0 6] 


Addition is defined only when the matrices to be added have the same shape — when they 


Thus 


are m Xn matrices with the same m and n. 
Scalar multiplication of a matrix by a number is also defined as with vectors. The result 
of multiplying an m Xn matrix A by a number c is another m Xn matrix B = (b;;), where 


bj; = ca;; for alli, j. Thus 
7 20 TL ces Hed? oi) 
1 =3, 54 DO On GH) 101) 9 


Numbers will also be referred to as scalars. Let’s assume for now that the scalars are real 
numbers. In later chapters other scalars will appear. Just keep in mind that, except for 
occasional reference to the geometry of real two- or three-dimensional space, everything in 
this chapter continues to hold when the scalars are complex numbers. 


The complicated operation is matrix multiplication. The first case to learn is the product 
AB of a row vector A and a column vector B, which is defined when both are the same size, 
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say m. If the entries of A and B are denoted by a; and b;, respectively, the gee ate AB is the 
1X1 matrix, or scalar, 


(1.1.2) aby + agbo 4+ ---4+ GpnOwm 
Thus 
1 
prpers ars fay Page one Te, 
4 


The usefulness of this definition becomes apparent when we regard A and B as vectors that 
represent indexed quantities. For example, consider a candy bar containing m ingredients. 
Let a; denote the number of grams of (ingredient); per bar, and let bj denote the cost of 
(ingredient); per gram. The matrix product AB computes the cost per bar: 


(grams/bar) - (cost/gram) = (cost/bar). 


In general, the product of two matrices A = (a;;) and B = (6;;) is defined when the 
number of columns of A is equal to the number of rows of B. If A is an Xm matfix and B is 
an m Xn matrix, then the product will be an € Xn matrix. Symbolically, 


(£Xm) -(mXn) = (Xn). 


The entries of the product matrix are computed by multiplying all rows of A by all columns 
of B, using the rule (1.1.2). If we denote the product matrix AB by P = (p;;), then 


13) Pij = G01; + Qi2b2j +--+ + GimBm;- 


This is the product of the ith row of A and the jth column of B. 


Pij 


For example, 


a4) Reeileveia 
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This definition of matrix multiplication has turned out to provide a very convenient 
computational tool. Going back to our candy bar example, suppose that there are € candy 
bars. We may form the £m matrix A whose ith row measures the ingredients of (bar),. If 
the cost is to be computed each year for n years, we may form the m Xn matrix B whose jth 
column measures the cost of the ingredients in (year);. Again, the matrix product AB = P 
computes cost per bar: p;; = cost of (bar); in (year);. 


One reason for matrix notation is to provide a shorthand way of writing linear 
equations. The system of equations 


Q44X1 +++ + Ainkn = Dy 
aX, +++) + AynXn = by 
AmiX1 + ++ + AnmnXn = bm 


can be written in matrix notation as 
(14:3) AX =B 


where A denotes the matrix of coefficients, X and B are column vectors, and AX is the 
matrix product: 


x4 gd 
bin 


We may refer to an equation of this form simply as an ‘“‘equation”’ or as a ‘“‘system.” 


' The matrix equation 
Deh OAr| 2a boven fie A 
be Bailes e In malate 
X3 


represents the following system of two equations in three unknowns: 


2X1, + X2 a | 
x + 3x2 +5x3 = 18. 


Equation (1.1.4) exhibits one solution, x; = 1, x. =-1, x3 = 4. There are others. 


The sum (1.1.3) that defines the product matrix can also be written in summation or 
“sigma’’ notation as 


m 
(1.1.6) Dif= Lb Ajyby;j = » Aiyby;. 
v=1 Vv 
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Each of these expressions for p;; is a shorthand notation for the sum. The large sigma 
indicates that the terms with the indices v = 1,...,m are to be added up. The right-hand 
notation indicates that one should add the terms with all possible indices v. It is assumed 
that the reader will understand that, if A is an €Xm matrix and B is an m Xn matrix, the 
indices should run from 1 to m. We've used the greek letter ‘‘nu,’’ an uncommon symbol 
elsewhere, to distinguish the index of summation clearly. 

Our two most important notations for handling sets of numbers are the summation 
notation, as used above, and matrix notation. The summation notation is the more versatile 
. of the two, but because matrices are more compact, we use them whenever possible. One 
of our tasks in later chapters will be to translate complicated mathematical structures into 
matrix notation in order to be able to work with them conveniently. 

Various identities are satisfied by the matrix operations. The distributive laws 


@a"7} A(B+B’)=AB+AB’, and (A+A’)B=AB+A’'B 

and the associative law 

(e183) (AB)C = A(BC) 

are among them. These laws hold whenever the matrices involved have suitable sizes, so 
that the operations are defined. For the associative law, the sizes should be A = Xm, 
B = mxXn, and C = nx p, for some £, m,n, p. Since the two products (1.1.8) are equal, 


parentheses are not necessary, and we will denote the triple product by ABC. It is an €X p 
matrix. For example, the two ways of computing the triple product 


1 220 
ABC= Ale O ij) 1 1 
0 1 
are 
ZO 
Lage Ost Zell sang sees | Zeme 
(aByc=|} 0 , lel =P > and ace) =| 5 | 2 ine alk 
Qa 
Scalar multiplication is compatible with matrix multiplication in the obvious sense: 
(1.1.9) C(AB) — CAB —ACcB). 


The proofs of these identities are straightforward and not very interesting. 
However, the commutative law does not hold for matrix multiplication, that is, 


(1.1.10) AB#BA, usually. 
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Even when both matrices are square, the two products tend to be different. For instance, 


om bh 2 eR ey Bet we. |2rol|? sare 
Drom), ed —dedo fa Oneal” 1 1 || Omed al 
If it happens that AB = BA, the two matrices are said to commute. 

Since matrix multiplication isn’t commutative, we must be careful when working with 
matrix equations. We can multiply both sides of an equation B = C on the left by a 
matrix A, to conclude that AB = AC, provided that the products are defined. Similarly, 
if the products are defined, we can conclude that BA = CA. We cannot derive AB = CA 
fromeB = C. 

A matrix all of whose entries are 0 is called a zero matrix, and if there is no danger of 
confusion, it will be denoted simply by 0. 

The entries a;; of a matrix A are its diagonal entries. A matrix A is a diagonal matrix 
if its onlv nonzero entries are diagonal entries. (The word nonzero simply means “clitliatene 
from zero.” It is ugly, but so convenient that we will use it frequently.) 

The diagonal m Xn matrix all of whose diagonal entries are equal to 1 is called then xn 
identity matrix, and is denoted by J,,. It behaves like the number 1 in multiplication: If A is 
an m Xn matrix, then : 


(tmp) Al, =A and In,jA=A. 


We usually omit the subscript and write J for In. 
Here are some shorthand ways of depicting the identity matrix: 


0 1 i 
We often indicate that a whole region in a matrix consists of zeros by leaving it blank or by 


putting in a single 0. 
We use * to indicate an arbitrary undetermined entry of a matrix. Thus 


* 


may denote a square matrix A whose entries below the diagonal are 0, the other entries 
being undetermined. Such a matrix is called upper triangular. The matrices that appear in 
(1.1.14) below are upper triangular. 


Let A be a (square) n Xn matrix. If there is a matrix B such that 


(L:bi2) AB = Ipe and BA => hae 
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then B is called an inverse of A and is denoted by A“!: 
(lashed) A'A=I=AA?, 
A matrix A that has an inverse is called an invertible matrix. 


For example, the matrix A = i Al is invertible. Its inverse is AW! = 3 Al as 


can be seen by computing the products AA! and A~!4. Two more examples: 


wo PY) = (a2 3 


We will see later that a square matrix A is invertible if there is a matrix B such that either 
one of the two relations AB = J, or BA = /,, holds, and that B is then the inverse (see 
(1.2.20)) . But since multiplication of matrices isn’t commutative, this fact is not obvious. On 
the other hand, an inverse is unique if it exists. The next lemma shows that there can be only 
one inverse of a matrix A: 


Lemma 1.1.15 Let A be a square matrix that has a right inverse, a matrix R such that AR = / 
and also a left inverse, a matrix L such that LA = J. Then R = L. So A is invertible and R is 
its inverse. 


Boop ih = (LAK = L(A — ii =T. O 
Proposition 1.1.16 Let A and B be invertible n Xn matrices. The product AB and the inverse 


A7! are invertible, (AB)! = B-!A7! and (A7!)"! = A. If Aj, ..., Am are invertible n Xn 
matrices, the product Aj --- Ay is invertible, and its inverse is Aj} ---A7!. 


Proof. Assume that A and B are invertible. To show that the product B'A™! = Q is the 


inverse of AB = P, we simplify the products PQ and QP, obtaining / in both cases. The 
verification of the other assertions is similar. gO 


1 
: ‘| 1 (0 Oils fe Fee P 1 =-1j/1 = Iss 
The inverse of | | | Al is | “|| iJ-| 1 . 
e It is worthwhile to memorize the inverse of a 22 matrix: 


-1 
a b 1 d -b 
(1.1.17) k A ~ ad — be ee A ; 


The denominator ad — bc is the determinant of the matrix. If the determinant is zero, the 
matrix is not invertible. We discuss determinants in Section 1.4. 
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Though this isn’t clear from the definition of matrix multiplication, we will see that most 
square matrices are invertible, though finding the inverse explicitly is not a simple problem 
when the matrix is large. The set of all invertible 7X7 matrices is called the n-dimensional 
general linear group. It will be one of our most important examples when we introduce the 
basic concept of a group in the next chapter. 

For future reference, we note the following lemma: 


Lemma 1.1.18 A square matrix that has either a row of zeros or a column of zeros is not 
invertible. 


Proof. lf a row of an n Xn matrix A is zero and if B is any other n Xn matrix, then the 
corresponding row of the product AB is zero too. So AB is not the identity. Therefore A has 
no right inverse. A similar argument shows that if a column of A is zero, then A has no left 
inverse. O 


Block Multiplication 


Various tricks simplify matrix multiplication in favorable cases; block multiplication is one 
of them. Let Vf and MM be 7 Xn and n X p matrices, and let r be an integer less than n. We 
may decompose the two matrices into blocks as follows: 


A’ 
M=[A|B] and M = Fal 


Where -{ has 7 columns and A’ has r rows. Then the matrix product can be computed as 
(1.1.19) MM’ = AA’ + BB’. 


Notice that this formula is the same as the rule for multiplying a row vector and a column 
vector. 


We may also multiply matrices divided into four blocks. Suppose that we decompose an 
mtXn matrix M and an xX p matrix M’ into rectangular submatrices 


A|B , [Als 
M= ’ M= 
Cp ep 


Where the number of columns of A and C are equal to the number of rows of A’ and B’. In 
this case the rule for block multiplication is the same as for multiplication of 22 matrices: 


(1.1.20) 


E ae =| ie AB’ + BD’ 
Chaar Wp | Wea” se aad 


These rules can be verified directly from the definition of matrix multiplication. 
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Please use block multiplication to verify the equation 


Posts fo of =f 36 
0 13515 O11 0 1 83 0 

Besides facilitating computations, block multiplication is a useful tool for proving facts 
about matrices by induction. 


Matrix Units 


The matrix units are the simplest nonzero matrices. The m Xn matrix unit e; j has a 1 in the 
i, j position as its only nonzero entry: 


(1.1.21) ee 


We usually denote matrices by uppercase (capital) letters, but the use of a lowercase letter 
for a matrix unit is traditional. 


¢ The set of matrix units is called a basis for the space of all m Xn matrices, because every 
m Xn matrix A = (q;;) is a linear combination of the matrices e; ;: 


ij 


The indices i, 7 under the sigma mean that the sum ts to be taken over alli = 1,..., m and 
alla. ..., 7. For instance, 


is ee ]+2| +a] , }+4| 1 | =3eu + 2e12 + Lea +4eno 


The product of an m Xn matrix unit e; ; and ann X p matrix unit e;¢ is given by the formulas 
(1.1.23) €ij Cje = Cie and ej exe = Oif j#k 


e The column vector e;, which has a single nonzero entry 1 in the position /, is analogous 
to a matrix unit, and the set {e€),..., @n} of these vectors forms what is called the standard 
basis of the n-dimensional space R” (see Chapter 3, (3.4.15)). If X is a column vector with 
entries (x1, ..., Xn), then 


(1.1.24) a X14, +---+Xn€n — Daevre: 


l 
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The formulas for multiplying matrix units and standard basis vectors are 


(1.1.25) eyjej=e;, and eje,=0 if Ke 


1.2 ROW REDUCTION 


Left multiplication by an 7X7 matrix A on 7 X p matrices, say 
(ial) AvxG— vy 


can be computed by operating on the rows of X. If we let X; and Y; denote the ith rows of 
X and Y, respectively, then in vector notation, 


(1.2.2) Yj =ayX1 +--+ +4inXn, 
—xX,;— —Y;— 
—X)— =n 
A - 
ae as a es 


For instance, the bottom row of the product 


[2 alli 3 ol=[i 5 | 


canbe computedas -2[1 2 1]+3{1 3 OJ=[1 5 -2]. 

Left multiplication by an invertible matrix is called a row operation. We discuss these 
row operations next. Some square matrices called elementary matrices are used. There are 
three types of elementary 2 X2 matrices: . 


(1.2.3) OF tf or ¥ il: ciy| “ii iy | ¢ | or? ale 


where a can be any scalar and c can be any nonzero scalar. 

There are also three types of elementary 7 X matrices. They are obtained by splicing 
the elementary 22 matrices symmetrically into an identity matrix. They are shown below 
with a 5X5 matrix to save space, but the size is supposed to be arbitrary. 
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(1.2.4) 
Type (i): 
l J i 
1 1 
i il a J i 
: or 1 G#j). 
if 1 i a il 
1 1 


One nonzero off-diagonal entry is added to the identity matrix. 


es i j 
Type (ii): F J 
l 0 1 
1 
ay 1 0 


The ith and jth diagonal entries of the identity matrix are replaced by zero, and 1’s are 
added in the (i, j) and (/, 7) positions. 


Type (iii): i 


i a (c+). 
1 
1 


One diagonal entry of the identity matrix is replaced by a nonzero scalar c. 


e The elementary matrices E operate on a matrix X this way: To get the matrix EX, you 
must: 


G25) Type(i): with ain the 7, j position, “‘add a-(row j) of X to (row 1), ” 
Type(ii): “interchange (row i) and (row Jj) of X,” 
Type(iii): ‘“‘multiply (row 1) of X by a nonzero scalar c.”’ 


These are the elementary row operations. Please verify the rules. 


Lemma 1.2.6 Elementary matrices are invertible, and their inverses are also elementary 
matrices. 


Proof. The inverse of an elementary matrix is the matrix corresponding to the inverse row 
operation: “subtract a-(row j) from (row i),” “interchange (row 1) and (row j)”’ again, or 


= 99 


“multiply (row i) by ct. 
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We now perform elementary row operations (1.2.5) on a matrix M, with the aim of 
ending up with a simpler matrix: 
sequence of operations __, 
Mf —> —> ---°—> MM. 
Since each elementary operation is obtained by multiplying by an elementary matrix, we 


can express the result of a sequence of such operations as multiplication by a sequence 
Bi, wees E;, of elementary matrices: 


(1.2.7) M = E,:--E2.E,M. 
This procedure to simplify a matrix is called row reduction. 


As an example. we use elementary operations to simplify a matrix by clearing out as 
many entries as possible, working from the left. 


Dl: 22 ee ip 2. 1S 
(1.2.8) M=1{112 6 10|}>>{0005 5|-> 
ee? 5. 2 Oo din 3 1a? 
ie eee oS LO eS iO “=10 3 
013 1 2is>)0 1 38 12s oo | =a 
OF MR ORS 8 es amas 0. 0: 30" Is OOO: le 


The matrix M’ cannot be simplified further by row operations. 


Here is the way that row reduction is used to solve systems of linear equations. 
Suppose We are given a system of mm equations in n unknowns, say AX = B, where A 
is an 71X71 matrix, B is a given column vector, and X is an unknown column vector. To 
solve this system, we form the #1 X(n + 1) block matrix, sometimes called the augmented 
matrix 


a1 -*: Gy b; 
(1.2.9) Male] 


Ami -** Gmn bn 
and we perform row operaticns to simplify M. Note that EM = [EA|EB]. Let 
M’ = [A’|B’] 
be the result of a sequence of row operations. The key observation is this: 


Proposition 1.2.10 The systems A’X = B’ and AX = B have the same solutions. 
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Proof. Since M' is obtained by a sequence of elementary row operations, there are elemen- 
tary matrices E;,..., Ex such that, with P = E,--- Fy, 


M’ = Ey:--E,;M = PM. 


The matrix P is invertible, and M’ = [A’|B’] = [PA|PB]. If X is a solution of the original 
equation AX = B, we multiply by P on the left: PAX = PB, which is to say, A'X = B’. 
So X also solves the new equation. Conversely, if A’X = B’, then P7!A’X = P~'B’, that is, 
AX = B. 0 


For example, consider the system 


Xy+ X24+2x3+ x= 5 
Cle2 an) Xy+ X2+ 2x34 6x4 =10 
X1+2x2+5x34+2x4= 7. 


Its augmented matrix is the matrix whose row reduction is shown above. The system of 
equations is equivalent to the one defined by the end result M’ of the reduction: 


xX Sees) eo) 
x2 + 3x3 == 
pa eal 


We can read off the solutions of this system easily: If we choose x3 = c arbitrarily, we can 
solve for x1, x2, and x4. The general solution of (1.2.11) can be written in the form 


w=, QlSe2-rc, ho 1 —8c, seen, 


where c is arbitrary. 

We now go back to row reduction of an arbitrary matrix. It is not hard to see that, by 
a sequence of row operations, any matrix M can be reduced to what is called a row echelon 
matrix. The end result of our reduction of (1.2.8) is an example. Here is the definition: A 
row echelon matrix is a matrix that has these properties: 


(P2312) 


(a) If (row i) of M is zero, then (row /) is zero for all j > i. 

(b) If (row i) isn’t zero, its first nonzero entry is 1. This entry is called a pivot. 

(c) If (row (i + 1)) isn’t zero, the pivot in (row (i + 1)) is to the right of the pivot in (row 1). 
(d) The entries above a pivot are zero. (The entries below a pivot are zero too, by (c).) 


The pivots in the matrix M’ of (1.2.8) and in the examples below are shown in boldface. 


To make a row reduction, find the first column that contains a nonzero entry, say 
m. (If there is none, then M is zero, and is itself a row echelon matrix.) Interchange rows 
using an elementary operation of Type (ii) to move m to the top row. Normalize m to 1 
using an operation of Type (iii). This entry becomes a pivot. Clear out the entries below 
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this pivot by a sequence of operations of Type (i). The resulting matrix will have the 
block form 


We now perform row operations to simplify the smaller matrix D;. Because the blocks to 
the left of D, are zero, these operations will have no effect on the rest of the matrix M;. By 
induction on the number of rows, we may assume that D, can be reduced to a row echelon 
matrix, say to D2, and M, is thereby reduced to the matrix 


This matrix satisfies the first three requirements for a row echelon matrix. The entries in By 
above the pivots of D2 can be cleared out at this time, to finish the reduction to row echelon 
form. O 


It can be shown that the row echelon matrix obtained from a matrix M by row reduction 
doesn’t depend on the particular sequence of operations used in the reduction. Since this 
point will not be important for us, we omit the proof. 

As we said before, row reduction is useful because one can solve a system of equations 
A’X = B’ easily when A’ is in row echelon form. Another example: Suppose that 


es ee 
[A|B]=|0 01.243 
OF.0 WO.) 


There is no solution to A’X = B’ because the third equation is 0 = 1. On the other hand, 


1i6n0: tigi 
[A IBJ=|0 012 | 3 
i Gant 


has solutions. Choosing x2 = c and x4 = c’ arbitrarily, we can solve the first equation for x 
and the second for x3. The general rule is this: 


Proposition 1.2.13 Let M’ = [A’|B’] be a block row echelon matrix, where B’ is a column 
vector. The system of equations A’X = B’ has a solution if and only if there is no pivot in the 
last column B’, In that case, arbitrary values can be assigned to the unknown x;, provided 
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that (column i) does not contain a pivot. When these arbitrary values are assigned, the other 
unknowns are determined uniquely. O 


Every homogeneous linear equation AX = 0 has the rrivial solution X = 0. But looking 
at the row echelon form again, we conclude that if there are more unknowns than equations 
then the homogeneous equation AX = 0 has a nontrivial solution. 


Corollary 1.2.14 Every system AX = 0 of m homogeneous equations in n unknowns, with 
m <n, has a solution X in which some x; is nonzero. 


Proof. Row reduction of the block matrix [A|0] yields a matrix [A’|0] in which A’ is in row 
echelon form. The equation A’X = 0 has the same solutions as AX = 0. The number, say r, 
of pivots of A’ is at most equal to the number m of rows, so it is less than n. The proposition 
tells us that we may assign arbitrary values to n — r variables x;. O 


We now use row reduction to characterize invertible matrices. 


Lemma 1.2.15 A square row echelon matrix M is either the identity matrix /, or else its 
bottom row is zero. 


Proof. Say that M is ann Xn row echelon matrix. Since there are n columns, there are at most 
n pivots, and if there are n of them, there has to be one in each column. In this case, M = I. 
If there are fewer than n pivots, then some row is zero, and the bottom row is zerotoo. O 


Theorem 1.2.16 Let A be a square matrix. The following conditions are equivalent: 


(a) A can be reduced to the identity by a sequence of elementary row operations. 
(b) A is a product of elementary matrices. 
(c) A is invertible. 


Proof. We prove the theorem by proving the implications (a) > (b) > (c) => (a). Suppose 
that A can be reduced to the identity by row operations, say E,---£,A = I. Multiplying 
both sides of this equation on the left by E;!---£,1, we obtain A = E;'---E, 1. Since 
the inverse of an elementary matrix is elementary, (b) holds, and therefore (a) implies (b). 
Because a product of invertible matrices is invertible, (b) implies (c). Finally, we prove the 
implication (c) => (a). If A is invertible, so is the end result A’ of its row reduction. Since an 
invertible matrix cannot have a row of zeros, Lemma 1.2.15 shows that A’ is the identity. O 


Row reduction provides a method to compute the inverse of an invertible matrix A: 
We reduce A to the identity by row operations: E,---£,A = I as above. Multiplying both 
sides of this equation on the right by A“}, " 


Ey--- Eyl= Ep+- Ey = A. 


Corollary 1.2.17 Let A be an invertible matrix. To compute its inverse, one may apply 
elementary row operations Ej, ..., Ex to A, reducing it to the identity matrix. The same 
sequence of operations, when applied to the identity matrix /, yields Ae O 
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Example 1.2.18 We invert the matrix A = E a . To do this, we form the 24 block 
matrix sade at 
ain=|) 6 | 0 ‘lk 


We perform row operations to reduce A to the identity, carrying the right side along, and 
thereby end up with A~! on the right. 


oe ak es) 0) 
iai)=|? 6 | 0 =I -4 | -2 |= 
is | 1. 6 a= | ce, 
(1.2.19) \-| | 2 i = [I|A™*). 
or} 4 -Af for] oH 5 


Proposition 1.2.20 Let A be a square matrix that has either a left inverse or a right inverse, 
a matrix B such that either BA = J or AB = I. Then A is invertible, and B is its inverse. 


Proof. Suppose that AB = I. We perform row reduction on A. Say that A’ = PA, where 
P = E,---E; is the product of the corresponding elementary matrices, and A’ is a row 
echelon matrix. Then A’B = PAB = P. Because P is invertible, its bottom row isn’t zero. 
Then the bottom row of A’ can’t be zero either. Therefore A’ is the identity matrix (1.2.15), 
and so P is a left inverse of A. Then A has both a left inverse and a right inverse, so it is 
invertible and B is its inverse. 

If BA = I, we interchange the roles of A and B in the above reasoning. We find that B 
is invertible and that its inverse is A. Then A is invertible, and its inverse is B. CO 


We come now to the main theorem about square systems of linear equations: 


Theorem 1.2.21 Square Systems. The following conditions on a square matrix A are 
equivalent: 


(a) A is invertible. 


(b) The system of equations AX = B has a unique solution for every column vector B. 
(c) The system of homogeneous equations AX = 0 has only the trivial solution X = 0. 


Proof. Given the system AX = B, we reduce the augmented matrix [A|B] to row echelon 
form [A’|B’]. The system A’X = B’ has the same solutions. If A is invertible, then A’ is the 
identity matrix, so the unique solution is X = B’. This shows that (a) => (b). 

If an n Xn matrix A is not invertible, then A’ has a row of zeros. One of the equations 
making up the system A’X = 0 is the trivial equation. So there are fewer than n pivots. 
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The homogeneous system A’X = 0 has a nontrivial solution (1.2.13), and so does AX = 0 
(1.2.14). This shows that if (a) fails, then (c) also fails, hence that (c) => (a). 
Finally, it is obvious that (b) > (c). O 


We want to take particular note of the implication (c) > (b) of the theorem: 


If the homogeneous equation AX = 0 has only the trivial solution, 
then the general equation AX = B has a unique solution for every column vector B. 


This can be useful because the homogeneous system may be easier to handle than the general 
system. 


Example 1.2.22 There exists a polynomial p(t) of degree n that takes prescribed values, say 
p(a;) = bj, atn + 1 distinct points t = ag, .. . , 4n on the real line.” To find this polynomial, 
one must solve a system of linear equations in the undetermined coefficients of p(t). In 
order not to overload the notation, we’ll do the case n = 2, so that 


p(t) = xo + xt + x20". 


Let ao, a;, a2 and bo, b;, b2 be given. The equations to be solved are obtained by substituting 
a; for t. Moving the coefficients x; to the right, they are 


Xo + QjX1 + arx2 — 6h 


for i = 0,1, 2. This is a system AX = B of three linear equations in the three unknowns 
Xo, X1, X2, with 


1 apo ar 
1 ayaa 
1 a a 


The homogeneous equation, in which B = 0, asks for a polynomial with 3 roots ao, a}, a2. A 
nonzero polynomial of degree 2 can have at most two roots, so the homogeneous equation 
has only the trivial solution. Therefore there is a unique solution for every set of prescribed 


values bo, by, b2. 
By the way, there is a formula, the Lagrange Interpolation Formula, that exhibits the 


polynomial p(t) explicitly. | O 


1.3. THE MATRIX TRANSPOSE 


In the discussion of the previous section, we chose to work with rows in order to apply the 
results to systems of linear equations. One may also perform column operations to simplify 
a matrix, and it is evident that similar results will be obtained. 


2Flements of a set are said to be distinct if no two of them are equal. 
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Rows and columns are interchanged by the transpose operation on matrices. The 
transpose of an mXn matrix A is the nXm matrix A‘ obtained by reflecting about the 
diagonal: A' = (b;;), where b; ; = aj. For instance, 


i 
ig? | lg t 
= — 2 
[i 27 [3 3] ant tr 2 ste] 2 
Here are the rules for computing with the transpose: 
(1.3.1) (AB) = BYA', (A+B)'=A'+B', (cA)'=cA', (A) =A. 


Using the first of these formulas, we can deduce facts about right multiplication from the 
corresponding facts about left multiplication. The elementary matrices (1.2.4) act by right 
multiplication AE as the following elementary column operations 


(1.3.2) “with a in the i, j position, add a-(column i) to (column j)”’; 
“interchange (column i) and (column J)”; 
“multiply (column i) by a nonzero scalar c.”’ 


Note that in the first of these operations, the indices i, j are the reverse of those in (1.2.5a). 


1.4 DETERMINANTS 


Every square matrix A has a number associated to it called its determinant, and denoted by 
det A. We define the determinant and derive some of its properties here. 
The determinant of a 1 X1 matrix is equal to its single entry 


(1.4.1) det [a] = a, 


and the determinant of a 22 matrix is given by the formula 


C Ba 


The determinant of a 2X2 matrix A has a geometric interpretation. Left multiplication 
by A maps the space R? of real two-dimensional column vectors to itself, and the area of 
the parallelogram that forms the image of the unit square via this map is the absolute value 
of the determinant of A. The determinant is positive or negative, according to whether the 
orientation of the square is preserved or reversed by the operation. Moreover, det A = 0 if 
and only if the parallelogram degenerates to a line segment or a point, which happens when 
the columns of the matrix are proportional. 


(1.4.2) det E A = ad — bc. 


A picture of this operation, in which the matrix is i ; 
page. The shaded region is the image of the unit square under the map. Its area is 10. 

This geometric interpretation extends to higher dimensions. Left multiplication by a 
3X3 real matrix A maps the space R? of three-dimensional column vectors to itself, and the 
absolute value of its determinant is the volume of the image of the unit cube. 


|. is shown on the following 
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(1.4.3) . 


The set of all real n Xn matrices forms a space of dimension n? that we denote by 
R”*”", We regard the determinant of n Xn matrices as a function from this space to the real 
numbers: 

det :R"*” > R. 
The determinant of an n Xn matrix is a function of its n” entries. There is one such function 
for each positive integer n. Unfortunately, there are many formulas for these determinants, 
and all of them are complicated when n is large. Not only are the formulas complicated, but 
it may not be easy to show directly that two of them define the same function. 

We use the following strategy: We choose one of the formulas, and take it as our 
definition of the determinant. In that way we are talking about a particular function. We 
show that our chosen function is the only one having certain special properties. Then, to 
show that another formula defines the same determinant function, one needs only to check 
those properties for the other function. This is often not too difficult. 


We use a formula that computes the determinant of an n Xn matrix in terms of certain 
(n — 1) X(n —1) determinants by a process called expansion by minors. The determinants of 
submatrices of a matrix are called minors. Expansion by minors allows us to give a recursive 
definition of the determinant. 

The word recursive means that the definition of the determinant for n Xn matrices 
makes use of the determinant for (n — 1) X(n — 1) matrices. Since we have defined the 
determinant for 1X1 matrices, we will be able to use our recursive definition to compute 
2X2 determinants, then knowing this, to compute 3 3 determinants, and so on. 

Let A be ann Xn matrix and let A;; denote the (m — 1) x (m — 1) submatrix obtained 
by crossing out the ith row and the jth column of A: 


(1.4.4) 
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For example, if 


in OS 


A=|2 1 2], then An=|§ Ah 
QO Sa 
e Expansion by minors on the first column is the formula 
(1.4.5) det A = a ,,det Aj, — @2)det Az; + a3,det Az; — --- +a,)det Ani. 


The signs alternate, beginning with +. 


It is useful to write this expansion in summation notation: 


(1.4.6) detA = > taydet Ay. 
iv) 


The alternating sign can be written as (-1)"+!. It will appear again. We take this formula, 
together with (1.4.1). as a recursive definition of the determinant. 

For 1 x 1 and 2X2 matrices, this formula agrees with (1.4.1) and (1.4.2). The determinant 
of the 3X3 matrix A shown above is | 


‘i 
det A = 1- det f: | = Is | +0-det ‘a is =] 49) —2 edema 
Expansions by minors on other columns and on rows, which we define in Section 1.6, are 
among the other formulas for the determinant. 

It is important to know the many special properties satisfied by determinants. We 
present some of these properties here, deferring proofs to the end of the section. Because 
We want to apply the discussion to other formulas, the properties will be stated for an 
unspecified function 6. 


Theorem 1.4.7 Uniqueness of the Determinant. There is a unique function 6 on the space of 
n\n matrices with the properties below, namely the determinant (1.4.5). 
(i) With J denoting the identity matrix, 6(/) = 1. 
(ii) 5 is linear in the rows of the matrix A. 
(iii) If two adjacent rows of a matrix A are equal, then 6(A) = 0. 


The statement that 4 is linear in the rows of a matrix means this: Let A; denote the ith row 
of a matrix A. Let A. B. D be three matrices, all of whose entries are equal, except for those 
in the rows indexed by k. Suppose furthermore that Dy, = cA, + c’ B, for some scalars c and 
c’. Then 5(D) = c5(A) + c’8(B): 


(1.4.8) 5} cAj+c'B; =cd| —A;— | +c’S| —B;— 
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This allows us to operate on one row at a time, the other rows being left fixed. For example, 


since[(0 2 3]=2[0 1 0]+3{0 0 1], 


1 1 1 
") Zao | = 1h +30 1 | =2-1+3-0=2. 
if 1 1 


Perhaps the most important property of the determinant is its compatibility with matrix 
multiplication. 


Theorem 1.4.9 Multiplicative Property of the Determinant. For any n Xn matrices A and B, 
det (AB) = (det A) (det B). 


The next theorem gives additional properties that are implied by those listed in (1.4.7). 


Theorem 1.4.10 Let 6 be a function on n Xn matrices that has the properties (1.4.7) (i,ii, iii). 
Then 


(a) If A’ is obtained from A by adding a multiple of (row j) of A to (row i) and i+ j, then 
6(A’) = d(A). 

(b) If A’ is obtained by interchanging (row /) and (row /) of A and i# j, then 
6(A’) =-d(A). 

(c) If A’ is obtained from A by multiplying (row i) by a scalar c, then 5(A’) = cd(A). 
If a row of a matrix A is equal to zero, then 6(A) = 0. 

(d) If (row i) of A is equal to a multiple of (row j) andi+# j, then 6(A) = 0. 


We now proceed to prove the three theorems stated above, in reverse order. The fact 
that there are quite a few points to be examined makes the proofs lengthy. This can’t be 
helped. 


Proof of Theorem 1.4.10. The first assertion of (c) is a part of linearity in rows (1.4.7)(ii). 
The second assertion of (c) follows, because a row that is zero can be multiplied by 0 without 
changing the matrix, and it multiplies 5(A) by 0. 


Next, we verify properties (a),(b),(d) when i and j are adjacent indices, say j = i+1.To 
simplify our display, we represent the matrices schematically, denoting the rows in question 


by R = (row i) and S = (row j), and suppressing notation for the other rows. So al 


denotes our given matrix A. Then by linearity in the ith row, 


(1.4.11) 5 [es |=2 Fs | - 35]. 


The first term on the right side is 5(A), and the second is zero (1.4.7). This proves (a) for 
adjacent indices. To verify (b) for adjacent indices, we use (a) repeatedly. Denoting the rows 
_ by Rand S as before: 
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(1.4.12) 
R R-S R-S R-S | -S S 
= — = — — =O H 
5[8]=2["5°]=9[sAae]=8["e |= [a] = Le 
Finally. (d) for adjacent indices follows from (c) and (1.4.7)(iii). 


To complete the proof, we verify (a),(b),(d) for an arbitrary pair of distinct indices. 
Suppose that (row 7) is a multiple of (row j). We switch adjacent rows a few times to obtain 
a matrix A’ in which the two rows in question are adjacent. Then (d) for adjacent rows tells 
us that 5(A’) = 0, and (b) for adjacent rows tells us that 5(A’) = +6(A). So 6(A) = 0, and 
this proves (d). At this point, the proofs of that we have given for (a) and (b) in the case of 
adjacent indices carry over to an arbitrary pair of indices. O 


The rules (1.4.10)(a).(b).(c) show how multiplication by an elementary matrix affects 
6, and they lead to the next corollary. 


Corollary 1.4.13 Let 5 be a function on n Xn matrices with the properties (1.4.7), and let E 
be an elementary matrix. For any matrix A, 6(EA) = 5(E)6(A). Moreover, 


(i) If E is of the first kind (add a multiple of one row to another), then 5(E) = 1. 
(ii) If E is of the second kind (row interchange), then 6(E) = -1. 
(iii) If E is of the third kind (multiply a row by c), then 5(E) =c. 


Proof. The rules (1.4.10)(a),(b),(c) describe the effect of an elementary row operation on 
5(A), so they tell us how to compute 6(EA) from 6(A). They tell us that 6(EA) = € 6(A), 
where € = 1, -1, or c according to the type of elementary matrix. By setting A = /, we find 
that 6(E) = 6(ED = €d(J) = €. a) 


Proof of the multiplicative property, Theorem 1.4.9. We imagine the first step of a row re- 
duction of A, say EA = A’. Suppose we have shown that 6(A’B) = 5(A’)6(B). We apply 
Corollary 1.4.13: 6(E)6(A) = 5(A’). Since A’B = E(AB) the corollary also tells us that 
5(A’B) = 5(E)5(AB). Thus 


5(E)5(AB) = 5(A’B) = 6(A’)5(B) = 8(E)6(A)4(B). 


Canceling 5(E), we see that the multiplicative property is true for A and B as well. This being 
so, induction shows that it suffices to prove the multiplicative property after row-reducing 
A. So we may suppose that A is row reduced. Then A is either the identity, or else its bottom 
row is zero. The property is obvious when A = I. If the bottom row of A is zero, so is the 
bottom row of AB, and Theorem 1.4.10 shows that 6(A) = 6(AB) = 0. The property is true 
in this case as well. 0 


Proof of uniqueness of the determinant, Theorem 1.4.7. There are two parts. To prove unique- 
ness, we perform row reduction on a matrix A, say A’ = E;--- EA. Corollary 1.4.13 tells us 
how to compute 5(A) from 6(A’). If A’ is the identity, then 6(A’) = 1. Otherwise the bottom 
row of A’ is zero, and in that case Theorem 1.4.10 shows that 5(A’) = 0. This determines 
5(A) in both cases. 
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Note: It is a natural idea to try defining determinants using compatibility with multiplication 
and Corollary 1.4.13. Since we can write an invertible matrix as a product of elementary 
matrices, these properties determine the determinant of every invertible matrix. But there 
are many ways to write a given matrix as such a product. Without going through some steps 
as we have, it won’t be clear that two such products will give the same answer. It isn’t easy 
to make this idea work. 


To complete the proof of Theorem 1.4.7, we must show that the determinant function 
(1.4.5) we have defined has the properties (1.4.7). This is done by induction on the size of the 
matrices. We note that the properties (1.4.7) are true when n = 1, in which case det [a] = a. 
So we assume that they have been proved for determinants of (n — 1)X(n — 1) matrices. 
Then all of the properties (1.4.7), (1.4.10), (1.4.13), and (1.4.9) are true for (n — 1) X(n — 1) 
matrices. We proceed to verify (1.4.7) for the function 5 = det defined by (1.4.5), and for 
n Xn matrices. For reference, they are: 


(i) With J denoting the identity matrix, det (J) = 1. 
(ii) det is linear in the rows of the matrix A. 
(iii) If two adjacent rows of a matrix A are equal, then det (A) = 0. 


(i) If A = 7,, then ay; = 1 and a,, = 0 when v > 1. The expansion (1.4.5) reduces 
to det(A) = 1 det(Aj,). Moreover, Aj; = [,_1, so by induction, det (A;;) = 1 and 
det op) => iL 

(ii) To prove linearity in the rows, we return to the notation introduced in (1.4.8). We show 
linearity of each of the terms in the expansion (1.4.5), ie., that 


(1.4.14) d, det (D,) = Cay, det (Ay) +c’ by det (By) 
for every index v. Let k be as in (1.4.8). 


Case 1: v = k. The row that we operate on has been deleted from the minors Ax, By, Dx, So 
they are equal, and the values of det on them are equal too. On the other hand, ag, by, dia 
are the first entries of the rows Ax, By, Dz, respectively. So dy, = cag +c’ by, and (1.4.14) 
follows. 


Case 2: v#k. If we let A’, B,, D, denote the vectors obtained from the rows Ax, By, Dx, 
respectively, by dropping the first entry, then A, is a row of the minor Ay, etc. Here 
Di, = cA, + c’ B,, and by induction on n, det (D’,,) = cdet (A),) + c’ det (B’,,). On the 
other hand, since v#k, the coefficients a1, b,1, dy are equal. So (1.4.14) is true in this case 
as well. 

(iii) Suppose that rows k and k + 1 of a matrix A are equal. Unless v = k or k + 1, the minor 
A, has two rows equal, and its determinant is zero by induction. Therefore, at most two 
terms in (1.4.5) are different from zero. On the other hand, deleting either of the equal rows 
gives us the same matrix. So ay, = ag411 and Ajy = Ax+11- Then 


det (A) = + ax; det (Agi) Fax4i1 det (Agy11) = 9. 


This completes the proof of Theorem 1.4.7. = eal fal 
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Corollary 1.4.15 


(a) A square matrix A is invertible if and only if its determinant is different from zero. If A 
is invertible, then det (A~!) = (det A)"!. 

(b) The determinant of a matrix A is equal to the determinant of its transpose At. 

(c) Properties (1.4.7) and (1.4.10) continue to hold if the word row is replaced by the word 
column throughout. 


Proof. (a) If A is invertible, then it is a product of elementary matrices, say A = Ej --: E, 
(1.2.16). Then det A = (det E))--- (det Ex). The determinants of elementary matrices are 
nonzero (1.4.13), so det.A is nonzero too. If A is not invertible, there are elementary matrices 
E,,..., Ey such that the bottom row of A’ = FE; --- EA is zero (1.2.15). Then det A’ = 0, and 
det A = 0 as well. If A is invertible, then det(A~!)det A = det(A_!A) = det/ = 1, therefore 
det (A7!) = (det A)"}. 


(b) It is easy to check that det E = det E’ if E is an elementary matrix. If A is invertible, 
we write A = E, --- Ex as before. Then A‘ = E/--- E}, and by the multiplicative property, 
det A = det A“. If A is not invertible, neither is A’. Then both det A and det A’ are zero. 


(c) This follows from (b). j 0 


1.5 PERMUTATIONS 


A permutation of a set S is a bijective map p from a set S to itself: 


(ies 1) p:S-> S. 

The table 

(1.5.2) i 1. 2e304ies 
a pi) | ss ae 


exhibits a permutation p of the set {1, 2, 3, 4, 5} of five indices: p(1) = 3, etc. It is bijective 
because every index appears exactly once in the bottom row. 

The set of all permutations of the indices {1, 2, ... , n} is called the symmetric group, 
and is denoted by Sp. It will be discussed in Chapter 2. 

The benefit of this definition of a permutation is that it permits composition of 
permutations to be defined as composition of functions. If g is another permutation, then 
doing first p then g means composing the functions: g o p. The composition is called the 
product permutation, and will be denoted by qp. 


Note: People sometimes like to think of a permutation of the indices 1, ...,n as a list of 
the same indices in a different order, as in the bottom row of (1.5.2). This is not good for 
us. In mathematics one wants to keep track of what happens when one performs two or 
more permutations in succession. For instance, we may want to obtain a permutation by 
repeatedly switching pairs of indices. Then unless things are written carefully, keeping track 
of what has been done becomes a nightmare. 0 


The tabular form shown above is cumbersome. It is more common to use cycle notation. 
To write a cycle notation for the permutation p shown above, we begin with an arbitrary 
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index, say 3, and follow it along: p(3) = 4, p(4) = 1, and p(1) = 3. The string of three 
indices forms a cycle for the permutation, which is denoted by 


(ie.>) (341). 


This notation is interpreted as follows: the index 3 is sent to 4, the index 4 is sent to 1, and 
the parenthesis at the end indicates that the index 1 is sent back to 3 at the front by the 
permutation: 

ea 


1 4 
So 
Because there are three indices, this is a 3-cycle. 
Also, p(2) = 5 and p(5) = 2, so with the analogous notation, the two indices 2, 5 form 
a 2-cycle (25). 2-cycles are called transpositions. 


The complete cycle notation for p is obtained by writing these cycles one after the 
other: 


(1.5.4) p = (341) (25). 


The permutation can be read off easily from this notation. 
One slight complication is that the cycle notation isn’t unique, for two reasons. First, 
we might have started with an index different from 3. Thus 


(341), (134) and (413) 


are notations for the same 3-cycle. Second, the order in which the cycles are written doesn’t 
matter. Cycles made up of disjoint sets of indices can be written in any order. We might just 


as well write 
p = (62) (134). 


The indices (which are 1, 2, 3, 4, 5 here) may be grouped into cycles arbitrarily, and the 
result will be a cycle notation for some permutation. For example, (34)(2)(15) represents 
the permutation that switches two pairs of indices, while fixing 2. However, 1-cycles, the 
indices that are left fixed, are often omitted from the cycle notation. We might write this 
permutation as (3 4)(15). The 4-cycle 


(5.5) q = (1452) 


is interpreted as meaning that the missing index 3 is left fixed. Then in a cycle notation for a 
permutation, every index appears at most once. (Of course this convention assumes that the 
set of indices is known.) The one exception to this rule is for the identity permutation. We’d 
rather not use the empty symbol to denote this permutation, so we denote it by 1. 

To compute the product permutation qp, with p and q as above, we follow the indices 
through the two permutations, but we must remember that gp means go p, “first do p, then 
q.” So since p sends 3 > 4 and q sends 4 — 5, gp sends 3 — 5. Unfortunately, we read 
cycles from left to right, but we have to run through the permutations from right to left, in a 
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zig-zag fashion. This takes some getting used to, but in the end it is not difficult. The result 
in our case is a 3-cycle: 


then this first do this 


qp = [(1452)]o[(341)(25)] = 35), 
the missing indices 2 and 4 being left fixed. On the other hand, 
pq = (234). 


Composition of permutations is not a commutative operation. 


There is a permutation matrix P associated to any permutation p. Left multiplication 
by this permutation matrix permutes the entries of a vector X using the permutation p. 

For example, if there are three indices, the matrix P associated to the cyclic permutation 
p = (123) and its operation on a column vector are as follows: 


00 1 xy pe 
(1.5.6) |p, Gan laa OT) xX2 |=] 1 
0 1 0 x x2 


Multiplication by P shifts the first entry of the vector X to the second position and so on. 

It is essential to write the matrix of an arbitrary permutation down carefully, and to 
check that the matrix associated to a product pq of permutations is the product matrix PQ. 
The matrix associated to a transposition (25) is an elementary matrix of the second type, 
the one that interchanges the two corresponding rows. This is easy to see. But for a general 
permutation, determining the matrix can be confusing. 


e To write a permutation matrix explicitly, it is best to use the n Xn matrix units e;;, the 
matrices with a single 1 in the 7, 7 position that were defined before (1.1.21). The matrix 
associated to a permutation p of S,, is 


(1.5.7) Pee. 
I 


(In order to make the subscript as compact as possible, we have written pi for p(i).) 


This matrix acts on the vector X = }° e;x; as follows: 


(1.5.8) PX= (> opal e;%;) = Dr amc) xj= » comeun ss > apem- 
: j a a Z 


This computation is made using formula (1.1.25). The terms epj;,;e; in the double sum are 
zero when i # j. 

To express the right side of (1.5.8) as a column vector, we have to reindex so that the 
standard basis vectors on the right are in the correct order, e),..., é@, rather than in the 
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permuted order ep), ..., @pn. We set pi = k andi = p™'k. Then 
(1.5.9) oe eC piXi = S CKX pig: 
i k 


This is a confusing point: Permuting the entries x; of a vector by p permutes the 
indices by p"!. 
For example, the 3X3 matrix P of (1.5.6) is e21 + e32 + e13, and 


PX = (€71 + €32 + €13)(€1.%1 + €2X2 + €3X3) = €1.X3 + €2X1 + €3XD. 


Proposition 1.5.10 


(a) A permutation matrix P always has a single 1 in each row and in each column, the rest 
of its entries being 0. Conversely, any such matrix is a permutation matrix. 

(b) The determinant of a permutation matrix is +1. 

(c) Let p and q be two permutations, with associated permutation matrices P and Q. The 
matrix associated to the permutation pq is the product PQ. 


Proof. We omit the verification of (a) and (b). The computation below proves (c): 
a (x eps.) py e433) = Do epi.i Cqii =D, epaiaieai.i = 2, epais: 
i j i,j j y 


This computation is made using formula (1.1.23). The terms ep; ,;é, ;, ; in the double sum are 
zero unless i = gj. So PQ is the permutation matrix associated to the product permutation 
pq, as claimed. : ; O 


e The determinant of the permutation matrix associated to a permutation p is called the 
sign of the permutation : 


(i511) Sisn p= deer = + 1 


A permutation p is even if its sign is +1, and odd if its sign is -1. The permutation (123) has 
sign +1. It is even, while any transposition, such as (12), has sign -1 and is odd. 

Every permutation can be written as a product of transpositions in many ways. If a 
permutation p is equal to the product Tj --- t%, where T; are transpositions, the number k 
will always be even if p is an even permutation and it will always be odd if p is an odd 
permutation. war 

This completes our discussion of permutations and permutation matrices. We will come 
back to them in Chapters 7 and 10. 


1.6 OTHER FORMULAS FOR THE DETERMINANT 


There are formulas analogous to our definition (1.4.5) of the determinant that use expansicns 
by minors on other columns of a matrix, and also ones that use expansions on rows. 
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Again, the notation A;; stands for the matrix obtained by deleting the ith row and the 
jth column of a matrix A. 


Expansion by minors on the jth column: 
detA = (-1)'Vay; det Ayj + (-1)?ap; dean, --+ (-1)"*4az det Anj, 
or in summation notation, 
uf ’ 
(1.6.1) det A = Lenya, wide As ;. 
pl 


Expansion by minors on the ith row: 


detA = (-1)'t1a,,det Ai + (-1)'*7ajdet Ajo Se (-1)'+"a;,det Ain, 


n 
(1.6.2) det A =) °(-1)'*”ajydet Aiv. 


p= 


For example, expansion on the second row gives 


112 
det! 0.204 |= Oder | leaden ~ | =tden| > eee 
pean 0 2 12 1 0 


To verify that these formulas yield the determinant, one can check the properties (1.4.7). 
The alternating signs that appear in the formulas can be read off of this figure: 


(1.6.3) 


The notation (-1)'+/ for the alternating sign may seem pedantic, and harder to remember 
than the figure. However, it is useful because it can be manipulated by the rules of algebra. 


We describe one more expression for the determinant, the complete expansion. The 
complete expansion is obtained by using linearity to expand on all the rows, first on (row 1) 
then on (row 2), and so on. For a 2X2 matrix, this expansion is made as follows: 


GDN a 0 1 
det {4 Al = adet k a | + Bet ? | 
1 0 0 ee i ‘ie 
1 0 | + adder | § 1 | + Beaet | 5] + baaet |} at 


: 


= ac det 
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The first and fourth terms in the final expansion are zero, and 


b 
det [i ie = ad det vs 7 + be det rr | = ad — be. 


Carrying this out for 1m Xn matrices leads to the complete expansion of the determinant, 
the formula 


(1.6.4) detA = )° (sign p) ai, p1-+-Gn, pn, 


perm p 


in which the sum is over all permutations of the n indices, and (sign p) is the sign of the 
permutation. 


For a 2X2 matrix, the complete expansion gives us back Formula (1.4.2). For a 3x3 
matrix, the complete expansion has six terms, because there are six permutations of three 
indices: 

(1.6.5) det A = 
411422433 + 412423431 + 413421432 — @11A23432 — A12A21A33 — 13422431. 


As an aid for remembering this expansion, one can display the block matrix [A|A]: 


a1 42 8 ay eee a3 
= x 
(1.6.6) ar, a2 as i “a “an 
a31 432 “a33 “a1 “a3. a33 
The three terms with positive signs are the products of the terms along the three diagonals 


that go downward from left to right, and the three terms with negative signs are the products 
of terms on the diagonals that go downward from right to left. 


Warning: The analogous method will not work with 4x4 determinants. 


The complete expansion is more of theoretical than of practical importance. Unless 
n is small or the matrix is very special, it has too many terms to be useful for com- 
putation. Its theoretical importance comes from the fact that determinants are exhibited 
as polynomials in the n* variable matrix entries a;;, with coefficients +1. For example, 
if each matrix entry a;; is a differentiable function of a variable ¢, then because sums 
and products of differentiable functions are differentiable, det A is also a differentiable 
function of f. 


The Cofactor Matrix 
The cofactor matrix of ann Xn matrix A is then Xn matrix cof(A) whose 7, j entry is 


(1.6.7) cof(A);; = (-1)'*det Aji, 
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where, as before, Aj; is the matrix obtained by crossing out the jth row and the 7th column. 
So the cofactor matrix is the transpose of the matrix made up of the (7 — 1) x (n — 1) minors 
of A, with signs as in (1.6.3). This matrix is used to provide a formula for the inverse matrix. 

If you need to compute a cofactor matrix, it is safest to make the computation in three 
steps: First compute the matrix whose 7, j entry is the minor det A; ;, then adjust signs, and 
finally transpose. Here is the computation for a particular 3 X3 matrix: 


(1.6.8) 
1 2 eee oe ees) 4. 7s 
A=1021/:| 2 0 1), ¥e2) 0 Tle) a O <1 | Scie 
io a2 Sila ee ee wie 2 


Theorem 1.6.9 Let A be an nm Xn matrix, let C = cof(A) be its cofactor matrix, and let 
a = det A. If a0, then A is invertible, and A7'! = a7 'C. In any case, CA = AC = al. 


Here a/ is the diagonal matrix with diagonal entries equal to a. For the inverse of a 22 
matrix, the theorem gives us back Formula 1.1.17. The determinant of the 3X3 matrix A 
whose cofactor matrix is computed in (1.6.8) above happens to be 1, so for that matrix, 
A’! = cof(A). 


Proof of Theorem 1.6.9. We show that the 7, 7 entry of the product CA is equal to@ iff = / 
and is zero otherwise. Let A; denote the 7th column of A. Denoting the entries of C and A 
by cj; and a; ;, the i, j entry of the product CA is 


(1.6.10) > ewayp= > CIP det A yas; 
Vv Vv 


When 7 = j, this is the formula (1.6.1) for the determinant by expansion by minors on 
column /. So the diagonal entries of CA are equal to qa, as claimed. 

Suppose that 1+ 7. We form a new matrix M in the following way: The entries of M are 
equal to the entries of A, except for those in column /. The 7th column M; of M is equal to 
the jth column A; of A. Thus the ith and the jth columns of M are both equal to A;, and 
deem = 0. 

Let D be the cofactor matrix of M, with entries d; ;. The 7, i entry of DM is 


Don Coutts: = Y>C1)"*det Myimyj. 
Vv Vv 
This sum is equal to det M, which is zero. 
On the other hand, since the 7th column of M is crossed out when forming M,,, that 


minor is equal to A,;. And since the ith column of M is equal to the jth column of A, 
My; = ayj;. So the i, i entry of DM is also equal to 


>5C1)"*4det Aviay;, 
v 
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which is the i, j entry of CA that we want to determine. Therefore the i, j entry of CA is 
zero, and CA = al, as claimed. It follows that A7! = a! cof(A) if a#0. The computation 
of the product AC is done in a similar way, using expansion by minors on rows. O 


A general algebraical determinant in its developed form 

may be likened to a mixture of liquids seemingly homogeneous, 

but which, being of differing boiling points, admit of being separated 
by the process of fractional distillation. 


—James Joseph Sylvester 


EXERCISES 


Section 1 The Basic Operations 


ls 


1.2. 


1.3 


e 


1.4. 


LS. 


1.6. 


Ono: oa 
Determine the products AB and BA for the following values of A and B: 


-8 -4 
ane 7 ee alice 4 
a= 3 f}2-[ 9 3]-a-[i se-[$ 4] 


b; 


Let A = [a, ---a,] be arow vector, andlet B= | : | be acolumn vector. Compute ~ 
the products AB and BA. - 


eee 
What are the entries a1, and a>3 of the matrix A = E 7 8 : ? 


Mofo aan 
Verify the associative law for the matrix product 4}. 
QO 1) | 13s 3 


Note: This is a self-checking problem. It won’t come out unless you multiply correctly. If 
you need to practice matrix multiplication, use this problem as a model. 


3Let A, B, and C be matrices of sizes £m, m Xn, and nX p. How many multiplications 
are required to compute the product AB? In which order should the triple product ABC 
be computed, so as to minimize the number of multiplications required? 


1a li leed ima |" 
Compute [ al ? | ana | ae 


nh 


1s aes | 
1.7. Find a formula for 1 1 | , and prove it by induction. 
1 


3Suggested by Gilbert Strang. 
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1.8. Compute the following products by block multiplication: 


Shri -icibwr oii 2}f1}2 3 
ae : ma’. O;i olf4)2 3 
OF I 1 OFTV tt ~~ sia 4 
0 1 1 | © rjis | ; 


1.9. Let A. B be square matrices. 
(a) When is (A + B)(A — B) = A* — B*?: (b) Expand (A + B)*. 
1.10. Let D be the diagenal matrix with diagonal entries dy... .. d,,, and let A = (a;;) be an 
arbitrary m Xn matrix. Compute the products DA and AD. 
LL. Prove that the product of upper triangular matrices is upper triangular. 


L.12. In each case. tind all 2\2 matrices that commute with the given matrix. 


i, a » Wl z © Ss gees 
@ | ' (b) k of (c) k 4 ca] | | (e) k | 
1.13. A square matrix 4 is nilperenr if AS = 0 for some k ~ 0. Prove that if A is nilpotent. then 


I+ Ais invertible. Do this by finding the inverse. 
1.14. Find infinitely many matrices B such that BA = [> when 


tw fae 


and prove that there is no matrix C such that AC = /3. 


1.15. With A arbitrary. determine the products e441. Aeg:. endo, cj) aregy. andre; egg. 


Section2 Row Reduction 


2.1. For the reduction of the matrix Mf (1.2.8) given in the text, determine the elementary 
matrices corresponding to each operation. Compute the product P of these elementarv 
matrices and verify that PM is indeed the end result. 


2.2. Find all solutions of the system of equations A.V = B when 


lt 2 i931 0 1 0 
A=|3 0 04 and B= (a)}0/. (b)| 1]. (| 2 
1 -4 -2 2 0 0 2 


2.3. Find all solutions of the equation x; + x2 + 2x3 — x4 = 3. 


2.4, Determine the elementary matrices used in the row reduction in Example (1.2.18), and 
verify that their product is A7!. 
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2.5. Find inverses of the following matrices: 


be") fa) Pall “Vt 2) 


2.6. The matrix below is based on the Pascal triangle. Find its inverse. 


bh ek ee ee 
WN Re 


2.7. Make a sketch showing the effect of multiplication by the matrix A = E | on 
the plane R?. a 


2.8. Prove that if a product AB of n Xn matrices is invertible, so are the factors A and B. 
2.9. Consider an arbitrary system of linear equations AX = B, where A and B are real 
matrices. 
(a) Prove that if the system of equations AX = B has more than one solution then it has 
infinitely many. 
(b) Prove that if there is a solution in the complex numbers then there is also a real 
solution. 


2.10. Let A be a square matrix. Show that if the system AX = B has a unique solution for some 
particular column vector B, then it has a unique solution for all B. 


Section3 The Matrix Transpose 
3.1. A matrix B is symmetric if B = B‘. Prove that for any square matrices B, BB‘ and B + B' 
are symmetric, and that if A is invertible, then (A)t = (At. 


3.2. Let A and B be symmetric n Xn matrices. Prove that the product AB is symmetric if and 
only if AB = BA. 


3.3. Suppose we make first a row operation, and then a column operation, on a matrix A. 
Explain what happens if we switch the order of these operations, making the column 
operation first, followed by the row operation. 


3.4. How much can a matrix be simplified if both row and column operations are allowed? 


Section 4 Determinants 
4.1. Evaluate the following determinants: 


ees 11 2 ae | 
@ |), al Ok mai fe ta a 


oOo One 
Oo AN © 
SO Slesy s) 
en) 
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4.2. (self-checking) Verify the rule det AB = (det A) (det B) for the matrices 


2 3 eee | 
A=[i AF and B=|5 S| 


4.3. Compute the determinant of the following 2 Xm matrix using induction on 7: 


Ds | 
-1 2-1 
-1 2-1 
=i er 
Ls 
-1 2 


4.4, Let A be ann Xn matrix. Determine det (—A) in terms of det A. 
4.5. Use row reduction to prove that det A' = det A. 
4.6. Prove that det k A = (det A)(det D), if A and D are square blocks. 
Section5 Permutation Matrices 
§.1. Write the following permutations as products of disjoint cycles: 
(12)(13)(14)(15), (123)(234)(345), (1234)(2345), (12)(23) (34) (45) SD, 

5.2. Let p be the permutation (1342) of four indices. 

(a) Find the associated permutation matrix P. 

(b) Write p as a product of transpositions and evaluate the corresponding matrix product. 


(c) Determine the sign of p. 
5.3. Prove that the inverse of a permutation matrix P is its transpose. 


5.4. What is the permutation matrix associated to the permutation of n indices defined by 
p(i) =n —i+1? What is the cycle decomposition of p? What is its sign? 


5.5. In the text, the products gp and pq of the permutations (1.5.2) and (1.5.5) were seen to 
be different. However, both products turned out to be 3-cycles. Is this an accident? 


Section6 Other Formulas for the Determinant 


6.1. (a) Compute the determinants of the following matrices by expansion on the bottom 


row: 
12 L Te 4 -1 1 abe 
5 alee 22> hate oe me 
0 2a ; as Me ie 
(b) Compute the determinants of these matrices using the complete expansion. 
(c) Compute the cofactor matrices of these matrices, and verify Theorem 1.6.9 


for them. 


6.2. Let A be an m Xn matrix with integer entries a; ;. Prove that A is invertible, and that its 
inverse A“! has integer entries, if and only if detA = +1.: 
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Miscellaneous Problems 


*MiA1. 


M.2. 


M.3. 


M.4. 


M.5. 


*M.8. 


M.9. 


A B 
. c Dil 
matrix. Suppose that A is invertible and that AC = CA. Use block multiplication to prove 


that det M = det (AD — CB). Give an example to show that this formula need not hold if 
INC =2(CA\. 


Let A be an m Xn matrix with m <n. Prove that A has no left inverse by comparing A 
to the square n Xn matrix obtained by adding (n — m) rows of zeros at the bottom. 


The trace of a square matrix is the sum of its diagonal entries: 


Let a 2n X2n matrix be given in the form M = where each block is ann Xn 


trace A = aj; +. ax +++» + nn, 


Show that trace (A + B) = trace A + trace B, that trace AB = trace BA, and that if B is 
invertible, then trace A = trace BAB™!. 


Show that the equation AB — BA = J has no solution in real n Xn matrices A and B. 


Write the matrix : A as a product of elementary inatrices, using as few as you can, 


and prove that your expression is as short as possible. 


. Determine the smallest integer n such that every invertible 2X2 matrix can be written as 


a product of at most 7 elementary matrices. 


(Vandermonde determinant) 
a ee | 

(a) Provethatdet|a b c |=(a-—b)(b-c)(c—a). 
Cb 


(b) Prove an analogous formula for n Xn matrices, using appropriate row operations to 
clear out the first column. 

(c) Use the Vandermonde determinant to prove that there is a unique polynomial p(f) 
of degree n that takes arbitrary prescribed values at nm + 1 points fo, ..., tn. 


(an exercise in logic) Consider a general system AX = B of m linear equations in n 
unknowns, where m and 7 are not necessarily equal. The coefficient matrix A may have 
a left inverse L, a matrix such that LA = /,. If so, we may try to solve the system as we 
learn to do in school: 
AX, LA DB, X= LB: 

But when we try to check our work by running the solution backward, we run into trouble: 
If X = LB, then AX = ALB. We seem to want L to be a right inverse, which isn’t what 
was given. 


(a) Work some examples to convince yourself that there is a problem here. 
(b) Exactly what does the sequence of steps made above show? What would the existence 
of a right inverse show? Explain clearly. 


Let A be areal 2 2 matrix, and let A,, Az be the columns of A. Let P be the parallelogram 
whose vertices are 0, A;, A2, Aj + A2. Determine the effect of elementary row operations 
on the area of P, and use this to prove that the absolute value |det A| of the determinant 
of A is equal to the area of P. 
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*M.10. Let A, B be mXn and nXm matrices. Prove that J,, — AB is invertible if and only if 


M.11 


I, — BA is invertible. 

Hint: Perhaps the only approach available to you at this time is to find an explicit 
expression for one inverse in terms of the other. As a heuristic tool, you could try 
substituting into the power series expansion for (1 — x) !. The substitution will make no 
sense unless some series converge, and this needn’t be the case. But any way to guess a 
formula is permissible, provided that you check your guess afterward. 


‘(discrete Dirichlet problem) A function f(u, v) is harmonic if it satisfies the Laplace 
equation td + = = 0. The Dirichlet problem asks for a harmonic function on a plane 


aus . . . . 
region R with prescribed values on the boundary. This exercise solves the discrete version 


of the Dirichlet problem. 

Let f be a real valued function whose domain of definition is the set of integers Z. To 
avoid asymmetry, the discrete derivative is defined on the shifted integers Z + i, as the 
first difference f’(m + 5) = f(n+1) — f(n). The discrete second derivative is back on 
the integers: f’(n) = f’(n+ ») —fi(n- 4) = f(n+1)-2f() + fv —1). 

Let f(u, v) be a function whose domain is the lattice of points in the plane with integer 
coordinates. The formula for the discrete second derivative shows that the discrete version 
of the Laplace equation for f is 


fu +i, v) + fu—-1, v) + fu, v+1) + fu, v—1) —4ftu, v) = 0. 


So f is harmonic if its value at a point (uw, v) is the average of the values at its four 
neighbors. 

A discrete region R in the plane is a finite set of integer lattice points. Its boundary 
OR is the set of lattice points that are not in A, but which are at a distance 1 from some 
point of R. We'll call R the interior of the region R = RU OR. Suppose that a function 
6 is given on the boundary dR. The discrete Dirichlet problem asks for a function f 
defined on R, that is equal to 6 on the boundary, and that satisfies the discrete Laplace 
equation at all points in the interior. This problem leads to a system of linear equations 
that we abbreviate as LX = B. To set the system up, we write 8,, for the given value 
of the function 6 at a boundary point. So f(u, v) = By» at a boundary point (u, v). Let 
Xyy denote the unknown value of the function f(u, v) at a point (u, v) of R. We order 
the points of R arbitrarily and assemble the unknowns x,y into a column vector X. The 
coefficient matrix L expresses the discrete Laplace equation, except that when a point 
of R has some neighbors on the boundary, the corresponding terms will be the given 
boundary values. These terms are moved to the other side of the equation to form the 
vector B. 


(a) When R is the set of five points (0, 0), (0, +1), (+1, 0), there are eight boundary 
points. Write down the system of linear equations in this case, and solve the Dirichlet 
problem when £ is the function on 0R defined by By, = Oif v < 0 and Byy = 1 if 
ESF 


(b) The maximum principle states that a harmonic function takes on its maximal value 
on the boundary. Prove the maximum principle for discrete harmonic functions. 


(c) Prove that the discrete Dirichlet problem has a unique solution for every region R 
and every boundary function B. 


+] learned this problem from Peter Lax. who told me that he had learned it from my father, Emil Artin. 


CORUAGP TT Be Ree 


Groups 


Il est peu de notions en mathématiques qui soient plus primitives 
que celle de loi de composition. 


—Nicolas Bourbaki 


2.1 LAWS OF COMPOSITION 


A law of composition on a set S is any rule for combining pairs a, b of elements of S to get 

another element, say p, of S. Some models for this concept are addition and multiplication 

of real numbers. Matrix multiplication on the set of n Xn matrices is another example. 
Formally, a law of composition is a function of two variables, or a map 


SXS > S. 


Here S x S denotes, as always, the product set, whose elements are pairs a, b of elements 
of S. 

The element obtained by applying the law to a pair a, b is usually written using a 
notation resembling one used for multiplication or addition: 


pz=ab, axXb, aob, a+b, 


or whatever, a choice being made for the particular law in question. The element p may be 
called the product or the sum of a and b, depending on the notation chosen. 

We will use the product notation ab most of the time. Anything done with product 
notation can be rewritten using another notation such as addition, and it will continue to be 
valid. The rewriting is just a change of notation. 

It is important to note right away that ab stands for a certain element of S, namely for 
the element obtained by applying the given law to the elements denoted by a and b. Thus 


: ban) 
if the law is matrix multiplication and if a = ti i and b = E 1 , then ab denotes 


the matrix 4 at Once the product ab has been evaluated, the elements a and b cannot 


be recovered from it. . 
With multiplicative notation, a law of composition is associative if the rule 


(aA) (ab)c =a(bc) (associative law) 
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holds for all a, b, c in S, where (ab)c means first multiply (apply the law to) a and b, then 
multiply the result ab by c. A law of composition is commutative if 


(27152) ab=ba_ (commutative law) 


holds for all a and b in S. Matrix multiplication is associative, but not commutative. 

It is customary to reserve additive notation a + b for commutative laws — laws such 
thata + b=b+<a forall a and b. Multiplicative notation carries no implication either way 
concerning commutativity. 

The associative law is more fundamental than the commutative law, and one reason for 
this is that composition of functions is associative. Let T be a set, and let g and f be maps 
(or functions) from T to T. Let go f denote the composed map t~» g( f(t): first apply f, 
then g. The rule 

eo fey 
is alaw of composition on the set of maps 7 — T. This law is associative. If f, g, and h are 
three maps from T to T, then (ho g)o f=ho(go f): 
ho a 
= Sees 
je ey eg 
> ae” 
sof 


Both of the composed maps send an element ¢ to h(g( f(t))). 
When T contains two elements, say T = {a, b}, there are four maps T > T: 


i; the identity map, defined by i(a) = a, i(b) = b; 
T: the transposition, defined by t(a) = b, t(b) = a; 
a: the constant function a(a) = a(b) =a; 

B: the constant function B(a) = B(b) = b. 


The law of composition on the set {i, Tt, a, 8} of maps T — T can be exhibited in a 
multiplication table: ia : 


i it ae 

i i Shp 
(2283) t t Sie 

a aadaada 

B | B Bye 
which is to be read in this way: 

f 
8 «ee ZO ip 


Thus toa = B, while wo tT = a. Composition of functions is not a commutative law. 
a a aa 
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Going back to a general law of composition, suppose we want to define the product of 
a string of n elements of a set: aja7---a, = ? There are various ways to do this using the 
given law, which tells us how to multiply two elements. For instance, we could first use the 
law to find the product a;qa2, then multiply this element by a3, and so on: 


((a1a2)a3)aq---. 


There are several other ways to form a product with the elements in the given order, but if 
the law is associative, then all of them yield the same element of S. This allows us to speak 
of the product of an arbitrary string of elements. 


Proposition 2.1.4 Let an associative law of composition be given on a set S. There is a 
unique way to define, for every integer n, a product of n elements aj, ..., dn of S, denoted 
temporarily by [aj ---a,], with the following properties: 


(i) The product [a;] of one element is the element itself. 


(ii) The product [a;a2] of two elements is given by the law of composition. 
(iii) For any integer i in the range 1 <i<n, [a,---dyn] =[a@)---a;][ajs1...an]. 


The right side of equation (iii) means that the two products [a; ...a;] and [aj41 ...@,] are 
formed first, and the results are then multiplied using the law of composition. 


Proof. We use induction on n. The product is defined by (i) and (ii) for m < 2, and it does 
satisfy (111) when n = 2. Suppose that we have defined the product of r elements when 
r <n -—1, and that it is the unique product satisfying (ii1). We then define the product of n 
elements by the rule 


[a1 --- An] = [a1 - + An-1)[an], 
where the terms on the right side are those already defined. If a product satisfying (iii) exists, 


then this formula gives the product because it is (ili) when 7 = n — 1. So if the product of n 
elements exists, it is unique. We must now check (iii) for i < n — 1: 


[ay --s@n)i= (ay: --@p—1 [an] (our definition) 
= ([a1 «++ aj][ai41--+4n-1])[an] (induction hypothesis) 
= [a;-+-aj]([@i41---4n—1][an]) (associative law) 
= (a, -+- aj][aj41---an] (induction hypothesis). 


This completes the proof. We will drop the brackets from now on and denote the product by 
ae O 
a an. ‘ 


An identity for a law of composition is an element e of S such that 
(2.125) ea=a and ae =a, forallain S. 


There can be at most one identity, for if e and e’ are two such elements, then since e is an 
identity, ee’ = e’, and since e’ is an identity, e = ee’. Thus e = ee’ = e’. 

Both matrix multiplication and composition of functions have an identity. For n Xn 
matrices it is the identity matrix /, and for the set of maps T — T it is the identity map — the 
map that carries each element of T to itself. 
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¢ The identity element will often be denoted by 1 if the law of composition is written 
multiplicatively, and by 0 if the law is written additively. These elements do not need to be 
related to the numbers 1 and 0, but they share the property of being identity elements for 
their laws of composition. 


Suppose that a law of composition on a set S, written multiplicatively, is associative 
and has an identity 1. An element a of S is invertible if there is another element b such that 


ab =1 and ba=1, 


and if so, then b is called the inverse of a. The inverse of an element is usually denoted by 
a~', or when additive notation is being used, by -a. 

We list without proof some elementary properties of inverses. All but the last have 
already been discussed for matrices. For an example that illustrates the last statement, see 
Exercise 1.3. 


e If an element a has both a left inverse @ and a right inverse 7, i.e., if 2a = 1 and 
ar = 1, then £2 = r, a is invertible, and r is its inverse. ( 

¢ If ais invertible, its inverse is unique. 

e Inverses multiply in the opposite order: If a and b are invertible, so is the product 
ab, and (ab)"! = b'a@"!. 

e An element a may have a left inverse or a right inverse, though it is not invertible. 


Power notation may be used for an associative law: With n > 0,a” = a---a(n factors), 
a” =a!...a!,anda° = 1. The usual rules for manipulation of powers hold: a”a* = a’+s 
and (a’)* = a’*. When additive notation is used for the law of composition, the power 
notation a” is replaced Oy the notationna =a+---+4a. 

Fraction notation = is not advisable unless the law of composition is commutative, 
because it isn’t clear from the notation whether the fraction stands for ba! or for a~!b, and 
these two elements may be different. 


2.2 GROUPS AND SUBGROUPS 
A group is a set G together with a law of composition that has the following properties: 


e The law of composition is associative: (ab)c = a(bc) for all a, b, cin G. 
¢ G contains an identity element 1, such that la = a and al =a forallainG. 
e Every element a of G has an inverse, an element b such that ab = 1 and ba = 1. 


An abelian group is a group whose law of composition is commutative. 

For example, the set of nonzero real numbers forms an abelian group under multipli- 
cation, and the set of all real numbers forms an abelian group under addition. The set of 
invertible m Xn matrices, the general linear group, is a very important group in which the 
law of composition is matrix multiplication. It is not abelian unless n = 1. 

When the law of composition is evident, it is customary to denote a group and the set 
of its elements by the same symbol. 


The order of a group G is the number of elements that it contains. We will often denote 
the order by |G|: 


(22:1) |G| = number of elements, the order, of G. 
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If the order is finite, G is said to be a finite group. If not, G is an infinite group. The same 
terminology is used for any set. The order | S| of a set S is the number of its elements. 
Here is our notation for some familiar infinite abelian groups: 


@22) at. the set of integers, with addition as its law of composition 
~ the additive group of integers, 
Rt: the set of real numbers, with addition as its law of 
composition — the additive group of real numbers, 
R: the set of nonzero real numbers, with multiplication as 
its law of composition — the multiplicative group, 
ct. cx: the analogous groups, where the set C of complex num- 


bers replaces the set R of real numbers. 


Warning: Others might use the symbol R* to denote the set of positive real numbers. To 
be unambiguous, it might be better to denote the additive group of reals by (R, +), thus 
displaying its law of composition explicitly. However, our notation is more compact. Also, 
the symbol R* denotes the multiplicative group of nonzero real numbers. The set of all real 
numbers is not a group under multiplication because 0 isn’t invertible. O 


Proposition 2.2.3 Cancellation Law. Let a,b,c be elements of a group G whose law of 
composition is written multiplicatively. If ab = ac or if ba = ca, then b = c. If ab = a or if 
ba=a,then) = 1. 


Proof. Multiply both sides of ab = ac on the left by a“! to obtain b = c. The other proofs 
are analogous. ; O 


Multiplication by a™! is essential for this proof. The Cancellation Law needn’t hold when 
the element a is not invertible. For instance, 


Pf IEP a 


Two basic examples of groups are obtained from laws of composition that we have 
considered — multiplication of matrices and composition of functions — by leaving out the 
elements that are not invertible. 

e Then Xn general linear group is the group of all invertible n Xn matrices. It is denoted by 
(2.2.4) ~ GL, = {nXn invertible matrices A}. 


If we want to indicate that we are working with real or with complex matrices, we write 


GL, (R) or GL, (C), according to the case. 
Let M be the set of maps from a set T to itself. A map f:T — T has an inverse 


function if and only if it is bijective, in which case we say f is a permutation of T. The 
permutations of T form a group, the law being composition of maps. As in section 1.5, we 


use multiplicative notation for the composition of permutations, writing gp for qo p. 


¢ The group of permutations of the set of indices {1,2,..., m}is called the symmetric group, 
and is denoted by Sp: 


Ga) Sn is the group of permutations of the indices 1, 2,...,n. 
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There are n! (‘n factorial’ = 1-2-3---n) permutations of a set of n elements, so the 
symmetric group Sy, is a finite group of order n!. 

The permutations of a set {a, b} of two elements are the identity 7 and the transposition 
t (see 2.1.3). They form a group of order two. If we replace a by 1 and b by 2, we see that 
this is the same group as the symmetric group S3. There is essentially only one group G of 
order two. To see this, we note that one of its elements must be the identity 1; let the other 
element be g. The multiplication table for the group contains the four products 11, 1g, gl, 
and gg. All except gg are determined by the fact that 1 is the identity element. Moreover, 
the Cancellation Law shows that gg g. The only possibility is gg = 1. So the multiplication 
table is completely determined. There is just one group law. 

We describe the symmetric group S3 next. This group, which has order six, serves 
as a convenient example because it is the smallest group whose law of composition isn’t 
commutative. We will refer to it often. To describe it, we pick two particular permutations 
in terms of which we can write all others. We take the cyclic permutation (123), and the 
transposition (12), and label them as x and y, respectively. The rules 


(2.2.6) e=1, Y=1, yx=x’y 


are easy to verify. Using the cancellation law, one sees that the six elements 1, x, x”, y, xy, x7y 
are distinct. So they are the six elements of the group: 


(224) S3= {1,.x, x?: y, xy, x7y}. 


In the future, we will refer to (2.2.6) and (2.2.7) as our ‘usual presentation” of the symmetric 
group $3. Note that S3 is not a commutative group, because yx # xy. 

The rules (2.2.6) suffice for computation. Any product of the elements x and y and of 
their inverses can be shown to be equal to one of the products (2.2.7) by applying the rules 
repeatedly. To do so, we move all occurrences of y to the right side using the last rule, and 
we use the first two rules to keep the exponents small. For instance, 

(2.2.8) xl pry =x yxry= x (yx)xy = 2° (x? y)xy = xyxy = x(x*y)y = 1. 

One can write out a multiplication table for S3 with the aid of the rules (2.2.6), and because 
of this, those rules are called defining relations for the group. We study defining relations in 
Chapter 7. 


We stop here. The structure of S, becomes complicated very rapidly as n increases. 


One reason that the general linear groups and the symmetric groups are important is 
that many other groups are contained in them as subgroups. A subset H of a group G is a 
subgroup if it has the following properties: 


(2.2.9) 
¢ Closure: If a and b are in H, then ab is in H. 


¢ Identity: 1 isin H. 
¢ Inverses: If ais in H, thena™! isin H. 
These conditions are explained as follows: The first one tells us that the law of composition 


on the group G defines a law of composition on H, called the induced law. The second and 
third conditions say that H is a group with respect to this induced law. Notice that (2.2.9) 
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mentions all parts of the definition of a group except for the associative law. We don’t need 
to mention associativity. It carries over automatically from G to the subset H. 

Notes: (i) In mathematics, it is essential to learn the definition of each term. An intuitive 
feeling will not suffice. For example, the set T of invertible real (upper) triangular 2 x 2 
matrices is a subgroup of the general linear group GL», and there is only one way to verify 
this, namely to go back to the definition. It is true that T is a subset of GL. One must verify 
that the product of invertible triangular matrices is triangular, that the identity is triangular, 
and that the inverse of an invertible triangular matrix is triangular. Of course these points 
are very easy to check. 

(ii) Closure is sometimes mentioned as one of the axioms for a group, to indicate that the 
product ab of elements of G is again an element of G. We include closure as a part of what 


is meant by a law of composition. Then it doesn’t need to be mentioned separately in the 
definition of a group. 


Examples 2.2.10 


(a) The set of complex numbers of absolute value 1, the set of points on the unit circle in 
the complex plane, is a subgroup of the multiplicative group C% called the circle group. 


(b) The group of real m Xn matrices with determinant 1 is a subgroup of the general linear 
group GL», called the special linear group. It is denoted by SLy: 


(22211) SL» (R) is the set of real m Xn matrices A with determinant equal to 1. 
The defining properties (2.2.9) are often very easy to verify for a particular subgroup, and 
we may not carry the verification out. 


e Every group G has two obvious subgroups: the group G itself, and the trivial subgroup 
that consists of the identity element alone. A subgroup is a proper subgroup if it is not one 
of those two. 


2.3 SUBGROUPS OF THE ADDITIVE GROUP OF INTEGERS 


We review some elementary number theory here, in terms of subgroups of the additive 
group Z* of integers. To begin, we list the axioms for a subgroup when additive notation is 
used in the group: A subset S of a group G with law of composition written additively is a 
subgroup if it has these properties: 


(23:1) 
e Closure: If a and bare in S, thena + bisin S. 


e Identity: 0 isin S. 


e Inverses: If ais in S then -a is in S, 


Let a be an integer different from 0. We denote the subset of Z that consists of all 
multiples of a by Za: 


(2.3.2) Za ={n¢Z|n=ka for some k in Z}. 


This is a subgroup of Z+. Its elements can also be described as the integers divisible by a. 
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Theorem 2.3.3 Let S be a subgroup of the additive group Z*. Either S is the trivial subgroup 
{0}, or else it has the form Za, where a is the smallest positive integer in S. 


Proof, Let S be a subgroup of Z*. Then 0 is in S, and if 0 is the only element of S then S 
is the trivial subgroup. So that case is settled. Otherwise, S contains an integer n different 
from 0, and either n or -n is positive. The third property of a subgroup tells us that -n is in 
S, so in either case, S contains a positive integer. We must show that S is equal to Za, when 
a is the smallest positive integer in S. 

We first show that Za is a subset of S, in other words, that ka is in S for every integer 
k. If k is a positive integer, then ka =a+a+---+a(k terms). Since a is in S, closure and 
induction show that ka is in S. Since inverses are in S, -ka is in S. Finally, 0 = Oa isin S. 

Next we show that S is a subset of Za, that is, every element 1 of S is an integer 
multiple of a. We use division with remainder to write n = ga+r, where q and r are integers 
and where the remainder r is in the range 0 < r < a. Since Za is contained in S, ga is in S, 
and of course n is in S. Since S is a subgroup, r = n — qa is in S too. Now by our choice, a is 
the smallest positive integer in S, while the remainder r is in the range 0 < r < a. The only 
remainder that can be in S is 0. Sor = 0 and n is the integer multiple qa of a. O 


There is a striking application of Theorem 2.3.3 to subgroups that contain two integers 
a and b. The set of all integer combinations ra + sb of a and b, 


(2.3.4) S=Za+Zb={neZ|n=ra+ sb for some integers r, s} 


is a subgroup of Z*. It is called the subgroup generated by a and b because it is the smallest 
subgroup that contains both a and b. Let’s assume that a and b aren’t both zero, so that S 
is not the trivial subgroup {0}. Theorem 2.3.3 tells us that this subgroup S has the form Zd 
for some positive integer d; it is the set of integers divisible by d. The generator d is called 
the greatest common divisor of a and Db, for reasons that are explained in parts (a) and (b) 
of the next proposition. The greatest common divisor of a and b is sometimes denoted by 
gcd(a, b). 


Proposition 2.3.5 Let a and b be integers, not both zero, and let d be their greatest common 
divisor, the positive integer that generates the subgroup S = Za + Zb. So Zd = Za + Zb. 
Then 

(a) d divides a and b. 

(b) If an integer e divides both a and J, it also divides d. 

(c) There are integers r and s such that d = ra + sb. 


Proof. Part (¢c) restates the fact that d is an element of S. Next, a and b are elements of S 
and S = Zd, so d divides a and b. Finally, if an integer e divides both a and b, then e divides 
the integer combination ra + sb = d. 0 


Note: If e divides a and b, then e divides any integer of the form ma + nb. So (ce) implies 
(b). But (b) does not imply (c). As we shall see, property (c) is a powerful tool. OC 


One can compute a greatest common divisor easily by repeated division with remainder: 
For example, if a = 314 and b = 136, then 


314 =2-1364+42, 136=3-42+10, 42 = 4-10-42. 
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Using the first of these equations, one can show that any integer combination of 314 and 136 
can also be written as an integer combination of 136 and the remainder 42, and vice versa. So 
Z(314) + Z(136) = Z(136) + Z(42), and therefore gcd(314, 136) = gcd(136, 42). Similarly, 
gcd(136, 42) = gced(42, 10) = gcd(10, 2) = 2. So the greatest common divisor of 314 and 136 
is 2. This iterative method of finding the greatest common divisor of two integers is called 
the Euclidean Algorithm. 

If integers a and Db are given, a second way to find their greatest common divisor is 
to factor each of them into prime integers and then to collect the common prime factors. 
Properties (a) and (b) of Proposition 2.3.5 are easy to verify using this method. But without 
Theorem 2.3.3, property (c), that the integer determined by this method is an integer 
combination of a and b wouldn’t be clear at all. Let’s not discuss this point further here. We 
come back to it in Chapter 12. 


Two nonzero integers a and D are said to be relatively prime if the only positive integer 
that divides both of them is 1. Then their greatest common divisor is 1: Za + Zb = Z. 


Corollary 2.3.6 A pair a, b of integers is relatively prime if and only if there are integers r 
and s such that ra + sb = 1. ee 


Corollary 2.3.7 Let p be a prime integer. If p divides a product ab of integers, then p 
divides a or p divides b. 


Proof. Suppose that the prime p divides ab but does not divide a. The only positive divisors 
of p are 1 and p. Since p does not divide a, gcd(a, p) = 1. Therefore there are integers r 
and s such that ra + sp = 1. We multiply by b: rab + spb = b, and we note that p divides 
both rab and spb. So p divides b. . Oo 


There is another subgroup of Z* associated to a pair a, b of integers, namely the 
intersection Za MN Zb, the set of integers contained both in Za and in Zb. We assume now 
that neither a nor b is zero. Then Za N Zb is a subgroup. It is not the trivial subgroup {0} 
because it contains the product ab, which isn’t zero. So Za N Zb has the form Zm for some 
positive integer m. This integer m is called the least common multiple of a and b, sometimes 
denoted by Icm(a, b), for reasons that are explained in the next proposition. 


Proposition 2.3.8 Let a and b be integers different from zero, and let m be their least 
common multiple — the positive integer that generates the subgroup S = Za/ Zb. So 
Zm = Zan Zb. Then 


(a) m is divisible by both a and b. 
(b) If an integer n is divisible by a and by 5, then it is divisible by m. 


Proof, Both statements follow from the fact that an integer is divisible by a and by b if and 
only if it is contained in Zm = Za/N Zb. Oo 
Corollary 2.3.9 Let d = gcd(a, b) andm = lcm(a, b) be the greatest common divisor and 


least common multiple of a pair a, b of positive integers, respectively. Then ab = dm. 


Proof. Since b/d is an integer, a divides ab/d. Similarly, b divides ab/d. So m divides 
ab/d, and dm divides ab. Next, we write d = ra + sb. Then dm = ram + sbm. Both terms 
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on the right are divisible by ab, so ab divides dm. Since ab and dm are positive and each 
one divides the other, ab = dm. O 


2.4 CYCLIC GROUPS 


We come now to an important abstract example of a subgroup, the cyclic subgroup generated 
by an arbitrary element x of a group G. We use multiplicative notation. The cyclic subgroup 
H generated by x is the set of all elements that are powers of x: 


(2.4.1) Hel uy 


This is the smallest subgroup of G that contains x, and it is often denoted by <x >. But to 
interpret (2.4.1) correctly, we must remember that the notation x” represents an element 
of the group that is obtained in a particular way. Different powers may represent the same 
element. For example, if G is the multiplicative group R* and x = -1, then all elements in 
the list are equal to 1 or to -1, and H is the set {1, -1}. 

There are two possibilities: Either the powers x” represent distinct elements, or they 
do not. We analyze the case that the powers of x are not distinct. 


Proposition 2.4.2 Let <x >be the cyclic subgroup of a group G generated by an element x, 
and let S denote the set of integers k.such that x* = 1. 


(a) The set S is a subgroup of the additive group Z*. 
(b) Two powers x” = x*, with r > s, are equal if and only if x”~* = 1,1.e., if and only ifr—s 


isin S. 
(c) Suppose that S is not the trivial subgroup. Then S = Zn for some positive integer n. 
The powers 1, x, x?,...,x"—! are the distinct elements of the subgroup <x >, and the 


order of <x> isn. 


Proof. (a) If x* = 1 and x = 1, then x*t+€ = x*x® = 1. This shows that if k and @ are in S, 
then k + @ isin S. So the first property (2.3.1) for a subgroup is verified. Also, x° = 1, so Ois 
in S. Finally, if k is in S,i.e., x* = 1, then x~* = (x*)~! = 1 too, so —kisin S. 

(b) This follows from the Cancellation Law 2.2.3. 


(c) Suppose that S+{0}. Theorem 2.3.3 shows that S = Zn, where n is the smallest positive 
integer in S. If x* is an arbitrary power, we divide k by n, writing kK = qn + r with r in the 
range 0 <r <n. Then x?” = 19 = 1, and x* = x9" x" = x". Therefore x* is equal to one of 
the powers 1, x, ...,.x”—~!. It follows from (b) that these powers are distinct, because x” is 
the smallest positive power equal to 1. O 


The group <x> = {1, x, ..., x"7!} described by part (c) of this proposition is called a 
cyclic group of order n. It is called cyclic because repeated multiplication by x cycles through 
the n elements. 

An element x of a group has order n if n is the smallest positive integer with the 
property x” = 1, which is the same thing as saying that the cyclic subgroup <x > generated 
by x has order n. 

With the usual presentation of the symmetric group $3, the element x has order 3, and 
y has order 2. In any group, the identity element is the only element of order 1. 
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If x” #1 for all n > 0, one says that x has infinite order. The matrix ; | has 


infinite order in GL2(R), while , 
When x has infinite order, the group <x is said to be infinite cyclic. We won’t have 
much to say about that case. 


has order 6. 


Proposition 2.4.3. Let x be an element of finite order n in a group, and let k be an integer 
that is written as k = nq +r where g and r are integers and r is in the range 0 < r <n. 


e xk =x", 


e x* =1 if and only ifr =0. 


e¢ Let d be the greatest common divisor of k and n. The order of x* is equal 
ton/d. O 


One may also speak of the subgroup of a group G generated by a subset U. This is 
the smallest subgroup of G that contains U, and it consists of all elements of G that can be 
expressed as a product of a string of elements of U and of their inverses. A subset U of G 
is said to generate G if every element of G is such a product. For example, we saw in (2.2.7) 
that the set U = {x, y} generates the symmetric group $3. The elementary matrices generate 
GL,, (1.2.16). In both of these examples, inverses aren’t needed. That isn’t always true. An 
infinite cyclic group <x > is -csaaiaage % the element x, but negative powers are needed to 
fill out the group. 

The Klein four group V, the group consisting of the four matrices 


(2.4.4) il wl: 


is the simplest group that is not cyclic. Any two of its elements different from the identity 
generate V. The quaternion group H is another example of a small group. It consists of the 
eight matrices 


(2.4.5) Pe { 21, +4, +jetk}, 


mate; Cree | Ot) eee 
t=[o i] [0 <]-4-[4 of: *=[¢ a} 


These matrices can be obtained from the Pauli matrices of physics by multiplying by 7. 
The two elements i and j generate H. Computation leads to the formulas 


where 


(2.4.6) ?=jf=kK=-1, ij=-ji=k, jk=-kj=i, ki=-ik =j. 


2.5 HOMOMORPHISMS | 3 ee 


Let G and G’ be groups, written with multiplicative notation. A homomorphism g:G > G' 
is amap from G to G’ such that for all a and bin G, 


(35!) p(ab) = y(a)g(d). 
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The left side of this equation means 

first multiply a and b in G, then send the product to G' using the map ¢, 
while the right side means 

first send a and b individually to G’ using the map ¢, then multiply their images in G’. 
Intuitively, a homomorphism is a map that is compatible with the laws of composition in the 
two groups, and it provides a way to relate different groups. 
Examples 2.5.2. The following maps are homomorphisms: 


(a) the determinant function det:GL,(R) —> R% (1.4.10), 

(b) the sign homomorphism o:S, — {+1} that sends a permutation to its sign (1.5.11), 
(c) the exponential map exp:R*+ — R% defined by x ~» e*, 

(d) the map g:Zt — G defined by y(n) = a”, where a is a given element of G, 

(e) the absolute value map | |: C*—> R*. 


In examples (c) and (d), the law of composition is written additively in the domain and 
multiplicatively in the range. The condition (2.5.1) for a homomorphism must be rewritten 
to take this into account. It becomes 


p(a+b) = g(a)g(d). 
The formula showing that the exponential map is a homomorphism is e+? = e%e?, 


The following homomorphisms need to be mentioned, though they are less interesting. 
The trivial homomorphism y:G — G’ between any two groups maps every element of G to 
the identity in G’. If H is a subgroup of G, the inclusion map i: H > G defined by i(x) = x 
for x in H is a homomorphism. 


Proposition 2.5.3 Let ¢:G — G’ be a group homomorphism. 


(a) Ifa,,..., a, are elements of G, then Q(a, ---az) = 9(a)) -+- O(ax). 
(b) g maps the identity to the identity: pig) =1q. 
(c) g maps inverses to inverses: g(a!) = y(a)"!. 


Proof. The first assertion follows by induction from the definition. Next, since 1-1 = 1 and 
since g is a homomorphism, g(1)g(1) = g(1- 1) = g(1). We cancel g(1) from both sides 
(2.2.3) to obtain g(1) = 1. Finally, p(a”!)g(a) = g(a 'a) = G1) = 1. Hence g(a“) is the 
inverse of g(a). O 


A group homomorphism determines two important subgroups: its image and its kernel. 


¢ The image of ahomomorphism y:G — G’, often denoted by img, is simply the image of 
g as a map of sets: . 


(2.5.4) img = {x € G’ | x = g(a) for some ain G}, 


Another notation for the image would be g(G). 
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. The image of the map Zt — G that sends n ~» a" is the cyclic subgroup <a > generated 
y a. 

. The image ofa homomorphism is a subgroup of the range. We will verify closure and 
omit the other verifications. Let x and y be elements of the image. This means that there 
are elements a and b in G such that x = g(a) and y = ¢(b). Since gy is a homomorphism, 
xy = g(a) g(b) = gab). So xy is equal to y(something). It is in the image too. 


¢ The kernel of a homomorphism is more subtle and also more important. The kernel of 9, 
ae 
often denoted by ker g, is the set of elements of G that are mapped to the identity in G’: 


Need et SN re a ONE 


@555) kerg = {ae G | g(a) = 1}. 


The kernel is a subgroup of G because, if a and bare in the kernel, then y(ab) = y(a)y(b) = 
1-1= 1, so ab is in the kernel, and so on. 


The kernel of the determinant homomorphism GL,,(R) — R% is the special linear 
group SLy(R) (2.2.11). The kernel of the sign homomorphism S, — {+1} is called the 
alternating group. It consists of the even permutations, and is denoted by Ay: 
ee 


mm 


(2,596) The alternating group A, is the group of even permutations. 


The kernel is important because it controls the entire homomorphism. It tells us not 
only which elements of G are mapped to the identity in G’, but also which pairs of elements 
have the same image in G’. 


e If H is a subgroup of a group G and a is an element of G, the notation aH will stand for 
the set of all products ah with h in H: 


(25) | aH = {g € G|g =ah for some h in }. 


This set is called a left coset of H in G, the word “‘left’’ referring to the fact that the element 
a appears on the left. 


Proposition 2.5.8 Let g:G — G’ be a homomorphism of groups, and let a and b be 
elements of G. Let K be the kernel of gy. The following conditions are equivalent: 


° g(a) = ¢(d), 

e a 'b isin K; 

e b isin the coset aK, 

e the cosets bK and aK are equal. 


Proof. Suppose that y(a) = y(b). Then g(a 'b) = yla"')g(b) = g(a)" (hb) = 1. 
Therefore a~!b is in the kernel K. To prove the converse, we turn this argument around. 
If a! b is in K, then 1 = g(a'b) = g(a)! pb), so y(a) = Y(b). This shows that the first 
two bullets are equivalent. Their equivalence with the other bullets follows. O 


Corollary 2.5.9 A homomorphism y:G -> G’ is injective if and only if its kernel K is the 
trivial subgroup {1} of G. 
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Proof. If K = {1}, Proposition 2.5.8 shows that g(a) = y(b) only when a'b=l,ie,a=b. 
Conversely, if @ is iain then the identity is the only element of G such that g(a) = 1, 
souk (4): 


The kernel of a homomorphism has another important property that is  oxplained in 
the next proposition. If a and g are elements of a group G, the element gag” ! is called the 
conjugate of a by g. 


Definition 2.5.10 A subgroup NV _ a group G is anormal subgroup if for every a 2 in! N and 
every gin G, the conjugate gag"! is in N. a ia 
ise 


Proposition 2.5.11 The kernel of a homomorphism is a normal subgroup. 


Proof. If a is in the kernel of a homomorphism gy: G — G’ and if g is pi element of G, 


then e(gag') = = 9(g)y(a)o(g') = v(g)1y(g)! = 1. Therefore gag’! is in the kernel 
too. oO 


Thus the special linear group SL, (R) is a normal subgroup of the general linear group 
GL,(R), and the alternating group A, is a normal subgroup of the symmetric group Sp. 
Every subgroup of an abelian group is normal, because if G is abelian, then gag! =a for 
all a and all g in the group. But subgroups of nonabelian groups needn’t be normal. For 
example, in the symmetric group $3, with its usual presentation (2.2.7), the cyclic subgroup 
< y> of order two is not normal, because y is in G, but xyx”! = x*y isn’t in<y>. 


¢ The center of a group G, which is often denoted by Z, is the set of elements that commute 
with every element of G: 


(25112) Z=WeeG | zx =xztorallx eG) 


It is always a noun Suberoup of G. The center of the special linear group SL2(R) consists 
of the two matrices J, —-7. The center of the symmetric group S,, is trivial ifm > 3. 
ms ——— 


Example 2.5.13. A homomorphism ~: S4 — $3 between symmetric groups. 
There are three ways to partition the set of four indices {1, 2, 3, 4} into pairs of subsets 
of order two, namely 


(2.5.14) TT, : {(1,2}U {3,4}, Mo: {1/3)ui2)4)) Ms: eae 


An element of the symmetric group S4 permutes the four indices, and by doing so it 
also permutes these three partitions. This defines the map g from S4 to the group of 
permutations of the set {I1,, 112, 113}, which is the symmetric group $3. For example, the 
4-cycle p = (1234) acts on subsets of order two as follows: 


{1, 2} ~> {2, 3} {1,3} ~> {2,4} (1, 4} ~~ 1, 2} 
{2, 3} ~» {3,4} (2, 4} ~» (1, 3} {3,4} > (1, 4} 


Looking at this action, one sees that p acts on the set {I1,, M1, I13} of partitions as the 
transposition (I1; I13) that fixes [12 and interchanges I; and 13. 
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If p and q are elements of S4, the product pq is the composed permutation po q, 
and the action of pq on the set {I1,, M>, 113} is the composition of the actions of q and p. 
Therefore (pq) = g(p)y(q), and ¢ is a homomorphism. 

The map is surjective, so its image is the whole group 53. Its kernel can be computed. 
It is the subgroup of 4 consisting of the identity and the three products of disjoint trans- 
positions: 


(2.5.15) . K = {1, (12)(34), (13)(24), (14)(23)}. Oo 


2.6 ISOMORPHISMS 


An isomorphism y:G — G' from a group G to a group G’ is a bijective grou homomor- 
phism — a bijective map such that p(ab) = y(a)y(b) for all a and b in G. 


Examples 2.6.1 


e The exponential map e* is an isomorphism, when it is viewed as a map from the 
additive group R* to its image, the multiplicative group of positive real numbers. 

e If a is an element of infinite order in a group G, the map sending n ~ a” is an 
isomorphism from the additive group Z* to the infinite cyclic subgroup <a> of G. 

¢ The set P of n Xn permutation matrices is a subgroup of GL, and the map S, > P 
that sends a permutation to its associated matrix (1.5.7) is an isomorphism. O 


Corollary 2.5.9 gives us a way to verify that a homomorphism g: G > G’ is an 
isomorphism. To do so, we check that ker g = {1}, which implies that ¢ is injective, and also 
that im g = G’, that is, g is surjective. _ 


Lemma 2.6.2 If @:G — G’ is an isomorphism, the inverse map g!:G’ — G is also an 
isomorphism. 


Proof. The inverse of a bijective map is bijective. We must show that for all x and yin G’, 
gp '(x)p7!(y) = w '(xy). We set a = o '(x),b =~ !(y), and c= g !(xy). What has to 
be shown is that ab = c, and since ¢ is bijective, it suffices to show that y(ab) = y(c). Since 
g is a homomorphism, 


: y(ab) = (a) g(b) = xy = Gc). O 


This lemma shows that when g:G — G’'isanisomorphism, we can make a computation 
in either group, then use ¢ or ¢ ! to carry it over to the other. So, for computation with the 
group law, the two groups have identical properties. To picture this conclusion intuitively, 
suppose that the elements of one of the groups are put into unlabeled boxes, and that 
we have an oracle that tells us, when presented with two boxes, which box contains their 
product. We will have no way to decide whether the elements in the boxes are from G or 
from G’. 

Two groups G and G’ are said to be isomorphic if there exists an isomorphism ¢ from 
G to G’. We sometimes indicate that two groups are isomorphic by the symbol * 


(2.6.3) GG’ means that G is isomorphic to G’. 
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Since isomorphic groups have identical properties, it is often convenient to identify them with 
each other when speaking informally. For instance, we often blur the distinction between 
the symmetric group S, and the isomorphic group P of permutation matrices. 


¢ The groups isomorphic to a given group G form what is called the isomorphism class of G. 


Any two groups in an isomorphism class are isomorphic. When one speaks of classifying 
groups, what is meant is to describe these isomorphism classes. This is too hard to do for all 
stoups, but we will see that every group of prime order p-is.cyclic. So all groups of order 
pare isomorphic. There are two isomorphism classes of groups of order 4 (2.11.5) and five 
isomorphism classes of groups of order 12 (7.8.1). 


An interesting and sometimes confusing point about isomorphisms is that there exist 
isomorphisms gy: G + G from a group G to itself. Such an isomorphism is called an 
automorphism. The identity map is an automorphism, of course, but there are nearly always 
others. The most important type of automorphism is conjugation: Let g be a fixed element 
of a group G. Conjugation by g is the map ¢ from G to itself defined by 


(2.6.4) g(x) = gxg'. 


This is an automorphism because, first of all, it is a homomorphism: 


p(xy) = gxyg! = exg'gyg! = o(x) 9), 


and second, it is bijective because it has an inverse function ~ conjugation by al 

If the group is abelian, conjugation by any element g is the identity map: gxg™! = x. 
But any noncommutative group has nontrivial conjugations, and so it has automorphisms 
different from the idéntity. For instance, in the symmetric group $3, presented as usual, 
conjugation by y interchanges x and x?. 

As was said before, the element gxg™! is the conjugate of x by g, and two elements 
x and x’ of a group G are conjugate if x’ = gxg™! for some g in G. The conjugate gxg ! 
behaves in much the same way as the element x itself; for example, it has the same order in 
the group. This follows from the fact that it is the image of x by an automorphism. (See the 
discussion following Lemma 2.6.2.) 


Note: One may sometimes wish to determine whether or not two elements x and y of a 
group G are conjugate, i.e., whether or not there is an element g in G such that y = gree 
It is almost always simpler to rewrite the equation to be solved for g as yg = gx. | 


¢ The commutator aba~'b™ is another element associated to a pair a, b of elements of a 
group. 


The next lemma follows by moving things from one side of an equation to the other. 


Lemma 2.6.5 Two elements a and b of a group commute, ab = ba, if and only if aba”! = b, 
and this is true if and only if aba~1b"! = 1. O 
2.7 EQUIVALENCE RELATIONS AND PARTITIONS 


A fundamental mathematical construction starts with a set S and forms a new set by equating 
certain elements of S. For instance, we may divide the set of integers into two classes, the 
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even integers and the odd integers. The new set we obtain consists of two elements that 


could be called Even and Odd. Or, it is common to view congruent triangles in the plane 


as equivalent geometric objects. This very general procedure arises in several ways that we 
discuss here. 


¢ A partition II of a set S is a subdivision of S into nonoverlapping, nonempty subsets: 
(27.1) S = union of disjoint nonempty subsets. 


The two sets Even and Odd partition the set of integers. With the usual notation, 
the sets 


(OE) {1}, {y, xy, xy}, (x, x7} 


form a partition of the symmetric group S3. 


e An equivalence relation on a set S is a relation that holds between certain pairs of elements 
of S. We may write it as a~ b and speak of it as equivalence of a and b. An equivalence 
relation is required to be: 


(257.3) 


e transitive: If a~b and b~c, then aw~c. 
° symmetric: If a~b, then b~a. 


e reflexive: For all a, a~a. 


Congruence of triangles is an example of an equivalence relation on the set of triangles 
in the plane. If A, B, and C are triangles, and if A is congruent to B and B is congruent to 
C, then A is congruent to C, etc. 

Conjugacy is an equivalence relation on a group. Two group elements are conjugate, 

ie a -j scene ‘ 
a~b, if b = gag™' for some group element g. We check transitivity: Suppose that a~ b 
and b~ c. This means that b = giag; andc = gobg5! for some group elements g, and g2. 


Then c = g2(g1a8;!)g83! = (g281)a(g281) »soa~ec. 
The concepts of a partition of S and an Sener aa on S are logically 


equivalent, though in practice one may be presented with just one of the two. 


Proposition 2.7.4 An equivalence relation on a set S determines a partition of S, and 
conversely. 2 


Proof. Given a partition of S, the corresponding equivalence relation is defined by the rule 
that a~ b if a and b lie in the same subset of the partition. The axioms for an equivalence 
relation are obviously satisfied. Conversely, given an equivalence relation, one defines a 
partition this way: The subset that contains a is the set of all elements b such that a ~ b. This 


subset is called the equivalence class of a. We'll denote it by Cy here: 
G73) C= [be S| a~bd}. 


The next lemma completes the proof of the proposition. O 
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Lemma 2.7.6 Given an equivalence relation on a set S, the subsets of $ that are equivalence 
classes partition S. 


Proof. This is an important point, so we will check it carefully. We must remember that the 
notation C, stands for a subset defined in a certain way. The partition consists of the subsets, 
and several notations may describe the same subset. 

The reflexive axiom tells us that a is in its equivalence class. Therefore the class Cg is 
nonempty, and since a can be any element, the union of the equivalence classes is the whole 
set S. The remaining property of a partition that must be verified is that equivalence classes 
are disjoint. To show this, we show: 


(2.7.7) If Cz and C; have an element in common, then Cg = Cp. 


Since we can interchange the roles of a and J, it will suffice to show that if Cg and Cp have 
an element, say d, in common, then Cy, C Cg, i.e., any element x of Cy is also in Cg. If x is 
in Cp, then b~ x. Since d is in both sets, a~ d and b~d, and the symmetry property tells 
us that d~ b. So we have a~ d, d~ b, and b ~ x. Two applications of transitivity show that 
a~ x, and therefore that x is in Cg. O 


For example, the relation on a group defined by a~ b if a and b are elements of the 
same order is an equivalence relation. The corresponding partition is exhibited in (2.7.2) for 
the symmetric group 53. 

If a partition of a set S is given, we may construct a new set S whose elements are 
the subsets. We imagine putting the subsets into separate piles, and we regard the piles as 
the elements of our new set_S. It seems advisable to have a notation to distinguish a subset 
from the element of the set S (the pile) that it represents. If U is a subset, we will denote by 
[U] the corresponding element of S. Thus if S is the set of integers and if Even and Odd 
denote the subsets of even and odd integers, respectively, then S contains the two elements 
[Even] and [Odd]. 

We will use this notation more generally. When we want to regard a subset U of S as 
an element of a set of subsets of S, we denote it by [U]. 

When an equivalence relation on S is given, the equivalence classes form a partition, 
and we obtain a new set S whose elements are the equivalence classes [Cz]. We can think of 
the elements of this new set in another way, as the set obtained by changing what we mean 
by equality among elements. If a and b are in S, we interpret a~ b to mean that a and b 
become equal in $, because Cg = Cy. With this way of looking at it, the difference between 
the two sets S and S is that in S more elements have been declared “equal,” i.e., equivalent. 
It seems to me that we often treat congruent triangles this way in school. 


For any equivalence relation, there is a natural surjective map 
(2.7.8) t:S—> S§ 


that maps an element a of S to its equivalence class: 7(a) = [Ca]. When we want to regard 
S as the set obtained from _S by changing the notion of equality, it will be convenient to 
denote the element [Ca] of S by the symbol @. Then the map zr becomes 


(Aya 
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We can work in S with the symbols used for elements of S, but with bars over them to 
remind us of the new rule: 


(2.7.9) If a and bare in S, thena@ = b means a~b. 


A disadvantage of this bar notation is that many symbols represent the same element 
of S. Sometimes this disadvantage can be overcome by choosing a particular element, a 
representative element, in each equivalence class. For example, the even and the odd integers 
are often represented by 0 and 1: 


(2.7.10) {[Even], [Odd]} = {0, 1}. 


Though the pile picture may be easier to grasp at first, the second way of viewing S is often 
better because the bar notation is easier to manipulate algebraically. 


The Equivalence Relation Defined by a Map 


Any map of sets f:S — T gives us an equivalence relation on its domain S. It is defined by 
the rulea~ bif f(a) = f(b). 


e The inverse image of an element ¢ of T is the subset of S consisting of all elements s such 
that f(s) = t. It is denoted symbolically as 


(2.7.11) f'@={seS| fs) =¢}. 


This is symbolic notation. Please remember that unless f is bijective, f~! will not be a map. 
The inverse images are also called the fibres of the map /, and the fibres that are not empty 
are the equivalence classes for the relation defined above. 

Here the set S of equivalence classes has another incarnation, as the image of the map. 
The elements of the image correspond bijectively to the nonempty fibres, which are the 
equivalence classes. 


G72) Some Fibres of the Absolute Value Map C*% > R*. 


Example 2.7.13 If G is a finite group, we can define a map f:G — N to the:set,{ 1, 2, ce 
of natural numbers, letting f(a) be the order of the element a of G. The fibres of this map 
are the sets of elements with the same order (see (2.7.2), for example). O 
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We go back to a group homomorphism y:G — G’. The equivalence relation on G 
defined by ¢ is usually denoted by =, rather than by ~, and is referred to as congruence: 


(2.7.14) a=b if g(a) = g(b). 


We have seen that elements a and b of G are congruent, i.e., g(a) = y(b), if and only if b is 
in the coset aK of the kernel K (2.5.8). 


Proposition 2.7.15 Let K be the kernel of ahomomorphism g:G -> G’. The fibre of ¢ that 
contains an element a of G is the coset aK of K. These cosets partition the group G, and 
they correspond to elements of the image of ¢. 


(2:76) A Schematic Diagram of a Group Homomorphism. 


2.8 COSETS ¥ 


As before, if H is a subgroup of G and if a is an element of G, the subset 
(2.8.1) aH = {ah | hin H}. 


is called a left coset. The subgroup H is a particular left coset because H = 1H. 
The cosets of H in G are equivalence classes for the congruence relation 


(2.8.2) a=b if b=ah forsomeh in H. 


This is very simple, but let’s verify that congruence is an equivalence relation. 


Transitivity: Suppose that a=b and b=c. This means that b = ah and c = bh’ for some 
elements h and h’ of H. Therefore c = ahh’. Since H is a subgroup, hh’ is in H, and thus 
asc. 

Symmetry: Suppose a= b, so that b = ah. Thena = bh"! and h"! is in H, so b=a. 
Reflexivity: a = a1 and 1 is in H, so a=a. 


Notice that we have made use of all the defining es of a subgroup here: closure, 
inverses, and identity. 
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Corollary 2.8.3 The left cosets of a subgroup H of a group G partition the group. 


Proof. The left cosets are the equivalence classes for the congruence relation (2.8.2). O 


Keep in mind that the notation aH defines a certain subset of G. As with any 
equivalence relation, several notations may define the saimé-subset. For example, in the 
symmetric group S3, with the usual presentation (2.2.6), the element y generates a cyclic 
subgroup H =< y>e of order 2. There are three left cosets of H in G: 


(2.8.4) H={l,yj=yH, xH=(x,xy)=xyH, °° H=({x?, xy) = xy. 


These sets do partition the group. 


Recapitulating, let H be a subgroup of a group G and let a and b be elements of G. 
The following are equivalent: 


(2.8.5) 
« b=ahforsomeh in H, or, a-‘bis an element of H, 


e bis an element of the left coset aH, 
e the left cosets aH and bH are equal. 


The number of left cosets of a subgroup is called the index of H in G. The index is 
denoted by — 


(2.8.6) [G:H]. 

a 
Thus the index of the subgroup < y> of $3 is 3. When G is infinite, the index may be infinite 
too. 


Lemma 2.8.7 All left cosets aH of a subgroup H of a group G have the same order. 


Proof. Multiplication by a defines a map H — aH that sends h ~ ah. This map is bijective 
because its inverse is multiplication by a!. a) 


Since the cosets all have the same order, and since they partition the group, we obtain 
the important Counting Formula 


(2.8.8) |IG| = |A|[G: A] 

(order of G) = (order of H) (number of cosets), 
where, as always, |G| denotes the order of the group. The equality has the obvious meaning 
if some terms are infinite. For the subgroup < y> of $3, the formula reads 6 = 2 - 3. 


It follows from the counting formula that the terms on the right side of (2.8.8) divide 
the left side. One of these facts is called Lagrange’s Theorem: 


Theorem 2.8.9 Lagrange’s Theorem. Let H be a subgroup of a finite group G. The order of 
H divides the order of G. O 


Corollary 2.8.10 The order of an element of a finite group divides the order of the group. 
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Proof. The order of an element a of a group G is equal to the order of the cyclic subgroup 
<a> generated by a (Proposition 2.4.2). O 


Corollary 2.8.11 Suppose that a group G has prime order p. Let a be any element of G 
other than the identity. Then G is the cyclic group <a> generated by a. 


Proof. The order of an element a#1 is greater than 1 and it divides the order of G, which 
is the prime integer p. So the order of a is equal to p. This is also the order of the cyclic 
subgroup <a> generated by a. Since G has order p,<a> = G. C] 


This corollary classifies groups of prime order p. They form one isomorphism class, the class 
of the cyclic groups of order p. 


The counting formula can also be applied when a homomorphism g:G —> G’ is given. 
As we have seen (2.7.15), the left cosets of the kernel ker g are the nonempty fibres of the 
map ¢. They are in bijective correspondence with the elements of the image. 


(2.8.12) [G:ker g] = |img]. 


Corollary 2.8.13 Let g:G — G’ be a homomorphism of finite groups. Then 
° |G| = |ker g| - |im gl, 
e |ker g| divides |G], and 
e |im g| divides both |G| and |G’. 


Proof. The first formula is obtained by combining (2.8.8) and (2.8.12), and it implies that 
|ker g| and |im | divide |G}. Since the image is a subgroup of G’, Lagrange’s theorem tells 
us that its order divides |G’| too. C] 


For example, the sign homomorphism ao: S, — {+1} (2.5.2)(b) is surjective, so its 
image has order 2. Its kernel, the alternating group A», has order snl. Half of the elements 


of S, are even permutations, and half are odd permutations. 


The counting formula 2.8.8 has an analogue when a chain of subgroups is given. 


Proposition 2.8.14 Multiplicative Property of the Index. Let G > H D K be subgroups of 
a group G. Then [G: KY= [G: H][H: K]. 


Proof. We will assume that the two indices on the right are finite, say [G : H] = m and 
[H: K] =n. The reasoning when one or the other is infinite is similar. We list the m cosets 
of H in G, choosing representative elements for each coset, say as g;H,..., 2m H. Then 
8, HU---U gmH isa partition of G. Similarly, we choose representative elements for each 
coset of K in H, obtaining a partition H = hi K U---Uh,K. Since multiplication by g; is 
an invertible operation, g;H = g;h,K U---U gjhy,K will be a partition of the coset 2; H. 
Putting these partitions together, G is partitioned into the mn cosets g;h co. O 


Right Cosets 


Let us go back to the definition of cosets. We made the decision to work with left cosets aH. 
One can also define right cosets of a subgroup H and repeat the above discussion for them. 
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The right cosets of a subgroup H of a group G are the sets 

(2.8.15) : Ha = {ha|he H}. 

They are equivalence classes for the relation (right congruence) 
a=b if b=ha, forsome hin H. 


Right cosets also partition the group G, but they aren’t always the same as left cosets. For 
instance, the right cosets of the subgroup < y> of 53 are 


(2.8.16) H={1,y}=Hy, Ax ={x,x*y} = Hx*y, Hx? = {x?, xy) = Hxy. 


This isn’t the same as the partition (2.8.4) into left cosets. However, if a subgroup is normal, 
its right and left cosets are equal. 
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Proposition 2.8.17 Let H be a subgroup of a group G. The following conditions are 
equivalent: 


(i) H is anormal subgroup: For all h in H and all g in G, ghg! isin H. 
(ii) For all gin G, gHg"! = H. 
(ili) For all g in G, the left coset gH is equal to the right coset Hg. 
(iv) Every left coset of H in G is a right coset. 


Proof. The notation gHg"! stands for these} of all elements ghg™!, with h in H. 

Suppose that H is normal. So (i) holds, ‘and it implies that gHg™! C A for all gin G. 
Substituting g~! for g shows that g | Hg C Has well. We multiply this inclusion on the left 
by g and on the right by g/! to conclude that H C gHg™!. Therefore gHg™! = H. This 
shows that (i) implies (ii). It is clear that (ii) implies (i). Next, if gHg' = H, we multiply 
this equation on the right by g to conclude that gH = Hg. This shows that (ii) implies (iii). 
One sees similarly that (iii) implies (ii). Since (iii) implies (iv) is obvious, it remains only to 
check that (iv) implies (iii). 

We ask: Under what circumstances can a left coset be equal to a right coset? We recall 
that the right cosets partition the group G, and we note that the left coset gH and the right 
coset Hg have an element in common, namely g = g-1=1- g. Soif the left coset gH is 


equal to any right coset, that coset must be Hg. O 

Proposition 2.8.18 

(a) If H is a subgroup of a group G and g is an element of G, the set gHg"! is also a 
subgroup. 


(b) If a group G has just one subgroup H of order r, then that subgroup is normal. 


Proof. (a) Conjugation by g is an automorphism of G (see (2.6.4)), and gHg"' is the image 
of H. (b) See (2.8.17): gHg™! is a subgroup of order r. ia 


Note: If H is a subgroup of a finite group G, the counting formulas using right cosets or left 
cosets are the same, so the number of left cosets is equal to the number of right cosets. This 
is also true when G is infinite, though the proof can’t be made by counting (see Exercise 


M.8). Oo 
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2.9 MODULAR ARITHMETIC 


This section contains a brief discussion of one of the most important concepts in number 
theory, congruence of integers. If you have not run across this concept before, you will want 
to read more about it. See, for instance, [Stark]. We work with a fixed positive integer n 
throughout the section. 


e Two integers a and b are said to be congruent modulo n, 
(2.9.1) a=b modulo n, 


if n divides b — a, or if b = a + nk for some integer k. For instance, 2=17 modulo 5. 

It is easy to check that congruence is an equivalence relation, so we may consider 
the equivalence classes, called congruence classes, that it defines. We use bar notation, and 
denote the congruence class of an integer a modulo n by the symbol a. This congruence 
class is the set of integers 


(2.9.2) G=(|..., @ =eapeeey, aict On, cal 


If a and b are integers, the equation a = b means that a=b modulo n, or that n divides 
b — a. The congruence class 0 is the subgroup 


OsZni= {..0,-n,0,n,2n7...} = en | kez) 


of the additive group Zt. The other congruence classes are the cosets of this subgroup. 
Please note that Zn is not a right coset — it is a subgroup of Z*. The notation for a coset of 
a subgroup A analogous to aH, but using additive notation for the law of composition, is 
a+H={a+h|he H}. To simplify notation, we denote the subgroup Zn by H. Then 
the cosets of H, the congruence classes, are the sets 


(2.9.3) a+H={a+kn|keZ}. 

The n integers 0, 1,..., 7 — 1 are representative elements for the m congruence classes. 
Proposition 2.9.4 There are n congruence classes modulo n, namely 0,1, ..., — 1. The 
index [Z:Zn] of the subgroup Zn in Z is n. O 


Let @and b be congruence classes represented by integers a and b. Their sum is defined 


to be the congruence class of a + b, and their product is the class of ab. In other words, by 
definition, 


(2.9.5) a@+b=a+b and ab=ab. 


This definition needs some justification, because the same congruence class can be repre- 
sented by many different integers. Any integer a’ congruent to a modulo n represents the 
same class as a does. So it had better be true that if a’=a and b’=b, thena’ + b'’=a+b 
and a’'b'= ab. Fortunately, this is so. 


Lemma 2.9.6 If a’=a and b’=b modulo n, then a’+b’=a+b and a’b'=ab 
modulo n. ; 
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Proof. Assume that a’=a and b’=b, so that a’ = a+rn and b’ = b+ sn for some 
integers r and s. Then a’ + b' =a+b-+(r+s)n. This shows that a’ + b'=a + b. Similarly, 
ab’ = (a+rn)(b + sn) = ab+ (as+rb+rns)n, so a'b! =ab. Oo 


The associative, commutative, and distributive laws hold for addition and multiplication 
of congruence classes because they hold for addition and multiplication of integers. For 
example, the distributive law is verified as follows: 


a(b+¢) =a(b+c) =a(b+c) (definition of + and X for congruence classes) 
=ab+ac (distributive law in the integers) 


=ab+aé=ab+aé (definition of + and X for congruence classes). 


The verifications of other laws are similar, and we omit them. 


The set of congruence classes modulo n may be denoted by any one of the symbols 
Z/Zn, Z/nZ, or Z/(n). Addition, subtraction, and multiplication in Z/Zn can be made 
explicit by working with integers and taking remainders after division by n. That is what the 
formulas (2.9.5) mean. They tell us that the map 


(2.9.7) Z—>Z/Zn 


that sends an integer a to its congruence class @ is compatible with addition and multiplication. 
Therefore computations can be made in the integers and then carried over to Z/Zn at the 
end. However, computations are simpler if the numbers are kept small. This can be done by 
computing the remainder after some part of a computation has been made. 

Thus if n = 29, so that Z/Zn = {0, 1,2, ..., 28}, then (35)(17 + 7) can be computed 
as 35-24 =6- (-5) =-30 =-1. 

In the long run, the bars over the numbers become a nuisance. They are often left off. 
When omitting bars, one just has to remember this rule: 


(2.9.8) To say a = bin Z/Zn means that a=b modulo n. 


Congruences modulo a prime integer have special properties, which we discuss at the 
beginning of the next chapter. 
2.10 THE CORRESPONDENCE THEOREM 
Let g:G — G be a group homomorphism, and let H be a subgroup of G. We may restrict p 
to H, obtaining a homomorphism 


(2.10.1) oly: H > G. 


This means that we take the same map ¢ but restrict its domain: So by definition, if / is in 
H, then [y|7](h) = v(h). (We’ve added brackets around the symbol ¢| 7 for clarity.) The 
restriction is a homomorphism because ¢ is one, and the kernel of yj is the intersection of 
the kernel of g with H: 


(2.10.2) ker (yl) = (ker gy) N H. 


= 
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This is clear from the definition of the kernel. The image of y| is the same as the image 
¢y(H) of H under the map ¢. 

The Counting Formula may help to describe the restriction. According to Corollary 
(2.8.13), the order of the image divides both || and |G]. If |H | and |G| have no common 
factor, p(H) = {1}, so A is contained in the kernel. 


Example 2.10.3 The image of the sign homomorphism o: S, > {+1} has order 2. If a 


. subgroup H of the symmetric group S, has odd order, it will be contained in the kernel 


of o, the alternating group A, of even permutations. This will be so when His the cyclic 
subgroup generated by a permutation q that is an element of odd order in the group. Every 
permutation whose order in the group is odd, such as an n-cycle with n odd, is an even 
permutation. A permutation that has even order in the group may be odd or even. O 


Proposition 2.10.4 Let g@: G — G be a homomorphism with kernel K and let H be a 
subgroup of G. Denote the inverse image g !(H) by H. Then H is a subgroup of G that 
contains K. If H is a normal subgroup of G, then H7 is a normal subgroup of G. If @ is 
surjective and if H is a normal subgroup of G, then # is a normal subgroup of G. 


For example, let g denote the determinant homomorphism GL,(R) > R*. The set of 
positive real numbers is a subgroup of R”; it is normal because R” is abelian. Its inverse 
image, the set of invertible matrices with positive determinant, is a normal subgroup of 
GL, (R). 


Proof. This proof is simple, but we must keep in mind that g“! is not a map. By definition, 
gy '(H) = H is the set of elements x of G such that g(x) is in H. First, if x is in the kernel 
K, then g(x) = 1. Since 1 is in H, x is in H. Thus H contains K. We verify the conditions 
for a subgroup. 

Closure: Suppose that x and y are in H. Then g(x) and @()y) are in H. Since H is a subgroup, 
~(x)@(y) is in H. Since ¢g is a homomorphism, g(x) ¢(y) = g(xy). So y(xy) is in H, and 
xyisin H. 

Identity: 1 is in H because g(1) = 1 is in H. 

Inverses: Let x be an element of H. Then g(x) is in H, and since H is a subgroup, g(x)! 
is also in H. Since g is a homomorphism, g(x)! = g(x~!), so g(x!) is in H, and x~! is 
in HZ. 

Suppose that 1 is a normal sulbenoun, Let x and g be elements of H and G, respec- 
tively. Then y(gxg7') = o(g)y(x)y(g) | isa =e of p(x), and g(x) is in H. Because 
H is normal, g(gxg ') is in H, and therefore gxg ! is in H. 

Suppose that @ is surjective, and that H is a normal subgroup of G. Let a be in 
H, and let b be in G. There are elements x of H and y of G such that g(x) = a 
and g(y) = b. Since H is normal, yxy! is in H, and therefore y(yxy !) = bab™! is 
in H. O 


Theorem 2.10.5 Correspondence Theorem. Let gy: G — G be a surjective group homo- 


morphism with kernel K. There is a bijective correspondence between subgroups of G and 
subgroups of G that contain K: 


{subgroups of G that contain K} <—> {subgroups of G}. 
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This correspondence is defined as follows: 
a subgroup H of G that contains K ~~ its image y(H) in G, 


a subgroup H of G ~» its inverse image y ‘(H) in G. 


If H and #1 are corresponding subgroups, then H is normal in G if and only if H is normal 
in G. 
If H and # are corresponding subgroups, then |H| = |H||K}. 


Example 2.10.6 We go back to the homomorphism g: Sy > 3 that was defined in Example 
2.5.13, and its kernel K (2.5.15). 

The group S3 has six subgroups, four of them proper. With the usual presentation, 
there is one proper subgroup of order 3, the cyclic group <x >, and there are three subgroups 
of order 2, including < y>. The Correspondence Theorem tells us that there are four proper 
subgroups of S4 that contain K. Since |K| = 4, there is one subgroup of order 12 and there 
are three of order 8. 

We know a subgroup of order 12, namely the alternating group A4. That is the subgroup 
that corresponds to the cyclic group <x > of S3. 

The subgroups of order 8 can be explained in terms of symmetries of a square. With 
vertices of the square labeled as in the figure below, a counterclockwise rotation through 
the angle 27/2 corresponds to the 4-cycle (1234). Reflection about the diagonal through the 
vertex 1 corresponds to the transposition (2 4). These two permutations generate a subgroup 
of order 8. The other subgroups of order 8 can be obtained by labeling the vertices in 
other ways. 


2 1 

3 4 
There are also some subgroups of S4 that do not contain K. The Correspondence 
Theorem has nothing to say about those subgroups. O 


Proof of the Correspondence Theorem. Let H be a subgroup of G that contains K, and let 
H be a subgroup of G. We must check the following points: 


e ~(FA) is a subgroup of G. 

° ie (H) is a subgroup of G, and it contains K. 

* His anormal subgroup of G if and only if gy '(H) is anormal subgroup of G. 
e (bijectivity of the correspondence) y(y!(H)) = H and yg '(~(A)) = H. 

© el) = IHIIKI. 


Since v(H) is the image of the homomorphism ¢]|y, it is a subgroup of G. The second and 
third bullets form Proposition 2.10.4. 

Concerning the fourth bullet, the equality y(y '(H)) = 1 is true for any surjective 
map of sets g: S > S’ and any subset H of S’. Also, HC gy ‘(p(A)) is true for any map 
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y of sets and any subset H of S. We omit the verification of these facts. Then the only 
thing remaining to be verified is that H > yg '(y(H)). Let x be an element of yp '(e(A)). 
We must show that x is in H. By definition of the inverse image, g(x) is in y( A), say 
y(x) = g(a), with a in H. Then a_'x is in the kernel K (2.5.8), and since H contains K, 
a™'x is in H. Since both a and a“! x are in H, x is in H too. 


We leave the proof of the last bullet as an exercise. O 


2.11 PRODUCT GROUPS 


Let G, G’ be two groups. The product set G x G’, the set of pairs of elements (a, a’) with 
a in G and a’ in G’, can be made into a group by component-wise multiplication — that is, 
multiplication of pairs is defined by the rule 


(2411.1) (a, a’) - (b, b’) = (ab, a'b'). 


The pair (1, 1) is the identity, and the inverse of (a, a’) is (a we! Ps The associative law in 
G XG’ follows from the fact that it holds in G and in G’. 

The group obtained in this way is called the product of G and G’ and is denoted by 
G XG’. It is related to the two factors G and G’ in a simple way that we can sum up in terms 


of some homomorphisms 
a G 
ae wee 


GxG' 


a oe 


Mey are defined’by ix) = (x) 1), ’Q@)=]U,9)) p@.4) =] Ge ee 
injective homomorphisms / and i’ may be used to identify G and G’ with their images, the 
subgroups G X 1 and 1 x G’ of G X G’. The maps p and p’ are surjective, the kernel of p is 
1X G’, and the kernel of p’ is G X 1. These are the projections. 

It is obviously desirable to decompose a given group G as a product, that is, to find 
groups H and H’ such that G is isomorphic to the product H x H’. The groups H and H’ 
will be simpler, and the relation between H X H’ and its factors is easily understood. It is 
rare that a group is a product, but it does happen occasionally. 

For example, it is rather surprising that a cyclic group of order 6 can be decomposed: 
A cyclic group C¢ of order 6 is isomorphic to the product C2 X C3 of cyclic groups of orders 
2 and 3. To see this, say that Cp = <y> and C3 = <z>, with y* = 1 and z> = 1, and let x 
denote the element (y, z) of the product group C2 X C3. The smallest positive integer k such 
that x* — oe ,z*) is the identity (1,1) is k = 6. So x has order 6. Since C2 X C3 also has 
order 6, it is equal to the cyclic group <x >. The powers of x, iforder, are 


(1,1), O.z),0,295 ©; DEGl os Gia oO 


There is an analogous statement for a cyclic group of order rs, whenever the two 
integers r and s have no common factor. 


(2.11.2) 


Proposition 2.11.3 Let r and s be relatively prime integers. A cyclic group of order rs is 
isomorphic to the product of a cyclic group of order r and a cyclic group of order s. O 
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On the other hand, a cyclic group of order 4 is not isomorphic to a product of two cyclic 
groups of order 2. Every element of C2 x C2 has order 1 or 2, whereas a cyclic group of order 
4 contains two elements of order 4. 


The next proposition describes product groups. 


Proposition 2.11.4 Let H and K be subgroups of a group G, and let f: H x K > G be the 
multiplication map, defined by f(h, k) = hk. Its image is the set HK = {hk|h € H, ke K}. 


(a) /f is injective if and only if HN K = {1}. 

(b) f isa homomorphism from the product group H x K to G if and only if elements of K 
commute with elements of H: hk = kh. 

(c) If H is anormal subgroup of G, then HK is a subgroup of G. 


(d) / is an isomorphism from the product group H x K to G if and only if HN K = {1}, 
HK = G, and also H and K are normal subgroups of G. | 


It is important to note that the multiplication map may be bijective though it isn’t a group 
homomorphism. This happens, for instance, when G = $3, and with the usual notation, 
fo x and K =< y >. 


Proof. (a) lf HN K contains an element x #1, then x! isin H,and f(x"!, x) =1= f(, 1), 
so f is not injective. Suppose that HM K = {1}. Let (h1, k,) and (hz, kz) be elements of 
HX K such that hi k; = h2k2. We multiply both sides of this equation on the left by ae and 
on the right by k5!, obtaining kyk5} a hy th. The left side is an element of K and the right 
sidelis amelementof H. SincesH 1 Kiel} kykh! Shy {hoe 1. Thensky = ko, hy =vhp, 
and (1, ky) = (hz, kp). ; . 
(b) Let (1, k,) and (h2, kz) be elements of the product group H x K. The product of these 
elements in the product group HX K is (hyh2, kjk2), and f(hyh2, kyk2) = hyh2kyk2, while 
Fy, ky) f(h2, k2) = hy ky hok2. These elements are equal if and only if h2ky = kyhp. 

(c) Suppose that H is a normal subgroup. We note that K H is a union of the left cosets 
kH with k in K, and that HK is a union of the right cosets Hk. Since H is normal, 
kH = Hk, and therefore HK = KH. Closure of HK under multiplication follows, because 
HKHK = HHKK = HK. Also, (hk)! =k "'h7! isin KH = HK. This proves closure of 
AK under inverses. : 


(d) Suppose that H and K satisfy the conditions given. Then / is both injective and surjective, 
so it is bijective. According to (b), it is an isomorphism if and only if hk = kh for all h in H 
and kin K. Consider the commutator (hkh7!)k"! = h(kh-'k7!), Since K is normal, the left 
side is in K, and since H is tormal, the right side is in H. Since HO K = {1}, hkh'k"! =1, 
and hk = kh. Conversely, if f is an isomorphism, one may verify the conditions listed in the 
isomorphic group HX K instead of in G. O 


We use this proposition to classify groups of order 4: 
Proposition 2.11.5 There are two isomorphism classes of groups of order 4, the class of the 


cyclic group C4 of order 4 and the class of the Klein Four Group, which is isomorphic to the 
product C2 X C2 of two groups of order 2. 
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Proof. Let G be a group of order 4. The order of any element x of G divides 4, so there are 
two cases to consider: 


Case 1: G contains an element of order 4. Then G is a cyclic group of order 4. 
Case 2: Every element of G except the identity has order 2. 


In this case, x = x! for every element x of G. Let x and y be two elements of G. Then 
xy has order 2, so xyx-!y! = (xy)(xy) = 1. This shows that x and y commute (2.6.5), and 
since these are arbitrary elements, G is abelian. So every subgroup is normal. We choose 
distinct elements x and y in G, and we let H and K be the cyclic groups of order 2 that they 
generate. Proposition 2.11.4(d) shows that G is isomorphic to the product group Hx K. U 


2.12 QUOTIENT GROUPS 


In this section we show that a law of composition can be defined on the set of cosets of a__ 
normal subgroup N of any group G. This law makes the set of cosets of a normal subgroup 
into a group, called a quotient group. 

Addition of congruence classes of integers modulo n is an example of the quotient 
construction. Another familiar example is addition of angles. Every real number represents 
an angle, and two real numbers represent the same angle if they differ by an integer multiple 
of 27r. The group N of integer multiples of 27: is a subgroup of the additive group R* of real 
numbers, and angles correspond naturally to (additive) cosets 9+ N of N in G. The group 
of angles is the quotient group whose elements are the cosets. 

The set of cosets of a normal subgroup WN of a group G is often denoted by G/N. 


(412.1) G/N is the set of cosets of N in G. 


When we regard a coset C as an element of the set of cosets, the bracket notation [C] 
may be used. If C = aN, we may also use the bar notation to denote the element [C] by a, 
and then we would denote the set of cosets by G: 


G=G/N. 


Theorem 2.12.2 Let N be a normal subgroup of a group G, and let G denote the set of 
cosets of N in G. There is a law of composition on G that makes this set into a group, such 


that the map 7:G — G defined by (a) = dis a surjective homomorphism whose kernel 
is N. 


¢ The map zis often referred to as the canonical map from G to G. The word ‘“‘canonical”’ 
indicates that this is the only map that we might reasonably be talking about. 


The next corollary is very simple, but it is important enough to single out: 


Corollary 2.12.3. Let N be a normal subgroup of a group G, and let G denote the set 
of cosets of N in G. Let 27: G — G be the canonical hones Letra), .....,. caprve 
elements of G such that the product ay --- a, isin N. Then @ ---@, = 1. 


ae Let p = a, --- ay. Then p is in N, so m(p) = Dp = 1. Since z is a homomorphism, 


dp = Pp. o fa 
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Proof of Theorem 2.12.2. There are several things to be done. We must 


¢ define a law of composition on G, 

° prove that the law makes G into a group, 

* prove that z is a surjective homomorphism, and 
e prove that the kernel of 7 is N. 


We use the following notation: If A and B are subsets of a group G, then AB denotes the 
set of products ab: 


(2.12.4) AB = {x ¢G | x =ab forsomeae A and be B}. 


We will call this a produetset, though in some other contexts the phrase “product set” refers 
to the set A X B of pairs of elements. 


Lemma 2.12.5 Let N be a normal subgroup of a group G, and let aN and DN be cosets of 
N. The product set (aN) (bN) is also a coset. It is equal to the coset abN. 


We note that the set (aN) (bN) consists of all elements of G that can be written in the 
form anbn’, with n and n’ in N. 


Proof. Since N is a subgroup, NN = N. Since N is normal, left and right cosets are equal: 
Nb = DN (2.8.17). The lemma is proved by the following formal manipulation: 


(aN)(bN) = a(Nb)N =a(bN)N = abNN =abN. cla 


This lemma allows us to define multiplication on the set G = G/N. Using the bracket 
notation (2.7.8), the definition is this: If C, and C2 are cosets, then [Cj ][C2] = [C1 C2], 
Where CC) is the product set. The lemma shows that this product set is another coset. To 
compute the product [C;][C2], take any elements a in C; and b in C2. Then C; = aN, 
C2 = DN, and CC? is the coset abN that contains ab. So we have the very natural formula 


(2.12.6) [aN][DN] =[abN] or ab=ab. 
Then by definition of the map 7 in (2.12.2), 
Qa27) m(a)1(b) = ab = ab = m(ab). 


The fact that z is ahomomorphism will follow from (2.12.7), once we show that G isa group. 
Since the canonical map 7 is surjective (2.7.8), the next lemma proves this. 


Lemma 2.12.8 Let G be a group, and let Y be a set with a law of composition, both 
laws written with multiplicative notation. Let ¢: G — Y be a surjective map with the 
homomorphism property, that p(ab) = y(a)y(b) for all a and b in G. Then Y is a group 
and g is a homomorphism. 
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Proof. The group axioms that are true in G are carried over to Y by the surjective map 9. 
Here is the proof of the associative law: Let y1, y2, y3 be elements of Y. Since ¢ is surjective, 
yi = v(x;) for some x; in G. Then 


(yi y2) 93 = (Y(41) (X2)) G(x3) =P X2) P(X3) =P((H1X2)X3) 
~ y(x4(%2%3)) = (x1) 9(%12x3)=9(41) (P(%2) 9(%3)) = Yi (293). 


The equality marked with an asterisk is the associative law in G. The other equalities follow 
from the homomorphism property of gy. The verifications of the other group axioms are 
similar. O 


The only thing remaining to be verified is that the kernel of the homomorphism 7 is 
the subgroup N. Well, (a) = (1) if and only if a = 1, or [aN] = [1], and this is true if 
and only if a is an element of N. | 


(2.12.9) A Schematic Diagram of Coset Multiplication. 


Note: Our assumption that N be a normal subgroup of G is crucial to Lemma 2.12.5. If H 
is not normal, there will be left cosets C; and C2 of H in G such that the product set C, C2 
does not lie in a single left coset. Going back once more to the subgroup H = < y> of 53, 
the product set (1H) (xH) contains four elements: {1, y}{x, xy} = {x, xy, xy, x7}. It is not 
a coset. The subgroup AH is not normal. 0 


The next theorem relates the quotient group construction to a general group homo- 
morphism, and it provides a fundamental method of identifying quotient groups. 


Theorem 2.12.10 First Isomorphism Theorem. Let gy: G > G’ be a Surjective group 
homomorphism with kernel N. The quotient group G = G/N is isomorphic to the image 
G’. To be precise, let 7: G > G be the canonical map. There is a unique isomorphism 
g:G —> G’ such that g = Gon. 


GS 


fs 
ta 
fee 
~~ /@ 
7 
4 
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Proof. The elements of G are the cosets of N, and they are also the fibres of the map y 
(2.7.15). The map @ referred to in the theorem is the one that sends a nonempty fibre to 
its image: G(X) = g(x). For any surjective map of sets g:G — G’, one can form the set 
G of fibres, and then one obtains a diagram as above, in which @ is the bijective map that 
sends a fibre to its image. When ¢ is a group homomorphism, @ is an isomorphism because 
Q(ab) = y(ab) = v(a)g(b) = P@(6). O 


Corollary 2.12.11 Let g:G — G’ bea group homomorphism with kernel N and image H’. 
The quotient group G = G/N is isomorphic to the image H’. O 


Two quick examples: The image of the absolute value map C* — R*% is the group 
of positive real numbers, and its kernel is the unit circle U. The theorem asserts that the 
quotient group C*/U is isomorphic to the multiplicative group of positive real numbers. 
The determinant is a surjective homomorphism G L,,(R) -> R*, whose kernel is the special 
linear group SL, (R). So the quotient GL, (R)/SL,,(R) is isomorphic to R%. 

There are also theorems called the Second and the Third Isomorphism Theorems, 
though they are less important. 


Es giebt alfo fehr viel verfdiedene Arten von Grogen, 

welche fic) nidyt wohl hergehlen lagen; 

und daher entfteben die ver(chiedene Theile der Mathematic, 

deren eine jegliche mit einer befondern Art von Grifen befdyaftiger ift. 


—Leonhard Euler 


EXERCISES 


Section 1 Laws of Composition 
1.1. Let S be a set. Prove that the law of composition defined by ab = a for all a and bin S is 
associative. For which sets does this law have an identity? 
1.2. Prove the properties of inverses that are listed near the end of the section. 


1.3. Let N denote the set {1, 2, 3, ..., } of natural numbers, and let s:N — N be the shift map, 
defined by s(n) =n + 1. Prove that s has no right inverse, but that it has infinitely many 
left inverses. 
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2.1. Make a multiplication table for the symmetric group $3. 

2.2. Let S be a set with an associative law of composition and with an identity element. Prove 
that the subset consisting of the invertible elements in S is a group. 

2.3. Let x, y, z, and w be elements of a group G. 


(a) Solve for y, given that xyz !w =1. 


(b) Suppose that xyz = 1. Does it follow that yzx = 1? Does it follow that yxz = 1? 
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2.4. 


2.5. 


2.6. 


In which of the following cases is H a subgroup of G? 


(a) G=GL,(C) and H=GL,(R). 

(b) G = R*% and H = {1,-1}. 

(c) G =Z* and H is the set of positive integers. 
(d) G = R% and H is the set of positive reals. 


(e) G = GL2(R) and H is the set of matrices 0 0 


i 5 with age). 


In the definition of a subgroup, the identity element in H is required to be the identity 
of G. One might require only that H have an identity element, not that it need be the 
same as the identity in G. Show that if H has an identity at all, then it is the identity in 
G. Show that the analogous statement is true for inverses. 


Let G be a group. Define an opposite group G° with law of composition a * b as follows: 
The underlying set is the same as G, but the law of composition is a * b = ba. Prove that 
G° is a group. 


Section 3 Subgroups of the Additive Group of Integers 


Ayslic 


os 


Jee 


Let a = 123 and b = 321. Compute d = gcd(a,b), and express d as an integer 
combination ra + sb. 


Prove that if a and b are positive integers whose sum is a prime p, their greatest common 

divisor is 1. 

(a) Define the greatest common divisor of a set {a,, ..., An} of m integers. Prove that it 
exists, and that it is an integer combination of a1, ..., Gn. 

(b) Prove that if the greatest common divisor of {a;,...,@n} is d, then the greatest 
common divisor of {a;/d,..., dn /d} is 1. 


Section 4 Cyclic Groups 


4.1. 


4.2. 


43. 
4.4. 
4.5. 


4.6. 


4.7. 


Let a and b be elements of a group G. Assume that a has order 7 and that a*b = ba’. 
Prove that ab = ba. 


An nth root of unity is a complex number z such that z” = 1. 


(a) Prove that the mth roots of unity form a cyclic subgroup of C” of order n. 
(b) Determine the product of all the mth roots of unity. 


Let a and b be elements of a group G. Prove that ab and ba have the same order. 
Describe all groups G that contain no proper subgroup. 


Prove that every subgroup of a cyclic group is cyclic. Do this by working with exponents, 
and use the description of the subgroups of ZT. 


(a) Let G be a cyclic group of order 6. How many of its elements generate G? Answer 
the same question for cyclic groups of orders 5 and 8. 


(b) Describe the number of elements that generate a cyclic group of arbitrary order n. 


Let x and y be elements of a group G. Assume that each of the elements x, y, and xy has 
order 2. Prove that the set H = {1, x, y, xy}isa subgroup of G, and that it has order 4. 


4.8. 


4.9. 
4.10. 


4.11. 
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(a) Prove that the elementary matrices of the first and third types (1.2.4) generate 
Gite): 


(b) Prove that the elementary matrices of the first type generate SL, (R). Do the 2 x2 
case first. 


How many elements of order 2 does the symmetric group 54 contain? 


Show by example that the product of elements of finite order in a group need not have 
finite order. What if the group is abelian? 


(a) Adapt the method of row reduction to prove that the transpositions generate the 
symmetric group S,. 
(b) Prove that, form > 3, the three-cycles generate the alternating group Ap. 


Section 5 Homomorphisms 


De 


5.2. 


Seas 


5A. 


5.5. 


5-0. 


Let g:G — G’ be asurjective homomorphism. Prove that if G is cyclic, then G’ is cyclic, 
and if G is abelian, then G’ is abelian. 


Prove that the intersection K 1 H of subgroups of a group G is a subgroup of H, and 
that if K is a normal subgroup of G, then KN H is a normal subgroup of H. 


Let U denote the group of invertible upper triangular 2 x 2 matrices A = ke il and 


let p: U — R* be the map that sends A ~~ a’. Prove that y is a homomorphism, and 
determine its kernel and image. 

Let f:R*+ > C* be the map f(x) = e’*. Prove that f isa homomorphism, and determine 
its kernel and image. 


Prove that the » Xn matrices that have the block form M = ‘ al with 


A in GL,(R) and D in GL,y_,(R), form a subgroup H of GL, (R), and that the 
map H — GL,(R) that sends M ~ A is ahomomorphism. What is its kernel? 


Determine the center of GL, (R). 
Hint: You are asked to determine the invertible matrices A that commute with every 
invertible matrix B. Do not test with a general matrix B. Test with elementary matrices. 


Section 6 Isomorphisms 


6.1. 


6.2. 


6.3. 


6.4. 


6.5. 


Let G’ be the group of real matrices of the form li a Is the map Rt > G’ that 
sends x to this matrix an isomorphism? 


Describe all homomorphisms y: Z* — Z*. Determine which are injective, which are 
surjective, and which are isomorphisms. 


Show that the functions f = i/x, g = (x — 1)/x generate a group of functions, the law of 
composition being composition of functions, that is isomorphic to the symmetric group S3. 


Prove that in a group, the products ab and ba are conjugate elements. 


; 1 : 
Decide whether or not the two matrices A = E | and B = [hs ‘| are conjugate 


elements of the general linear group GL2(R). 
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6.6. 


6.7. 


6.8. 
6.9. 
6.10. 


6.11. 


Are the matrices E i E 1 conjugate elements of the group GL2(R)? Are they 


conjugate elements of SL2(R)? 


Let be a subgroup of G, and let g be a fixed clamane of G. The conjugate sete: 
gHg'' is defined to be the set of all conjugates ghg"'!, with h in H. Prove that oike 
a subgroup of G. 


Prove that the map A ~» (A')~! is an automorphism of GL» (R). 

Prove that a group G and its opposite group G° (Exercise 2.6) are isomorphic. 
Find all automorphisms of 

(a) a cyclic group of order 10, (b) the symmetric group $3. 


Let a be an element of a group G. Prove that if the set {1, a} is a normal subgroup of G, 
then a is in the center of G. 


Section 7 Equivalence Relations and Partitions 


eke 


ee, 


Va 


7.4. 


1a 


7.6. 


Let G be a group. Prove that the relation a~b if b = gag! for some g in G is an 
equivalence relation on G. 


An equivalence relation on S is determined by the subset R of the set S x S consisting of 
those pairs (a, b) such that a~ b. Write the axioms for an equivalence relation in terms 
of the subset R. 


With the notation of Exercise 7.2, is the intersection RM R’ of two equivalence relations 
R and R’ an equivalence relation? Is the union? 


A relation R on the set of real numbers can be thought of as a subset of the (x, y)-plane. 
With the notation of Exercise 7.2, explain the geometric meaning of the reflexive and 
symmetric properties. 


With the notation of Exercise 7.2, each of the following subsets R of the (x, y)-plane 
defines a relation on the set R of real numbers. Determine which of the axioms (2.7.3) 
are satisfied: (a) the set {(s, s) | s € R}, (b) the empty set, (c) the locus {xy +1 = 0}, 
(d) the locus {x?y — xy* —-x+ y=0}. 


How many different equivalence relations can be defined on a set of five elements? 


Section 8 Cosets 


8.1. 


8.2. 


8.3. 


8.4. 
8.5. 


8.6. 


Let H be the cyclic subgroup of the alternating group Aq generated by the permutation 
(123). Exhibit the left and the right cosets of H explicitly. 


In the additive group R” of vectors, let W be the set of solutions of a system of homo- 
geneous linear equations AX = 0. Show that the set of solutions of an inhomogeneous 
system AX = B is either empty, or else it is an (additive) coset of W. 


Does every group whose order is a power of a prime p contain an element of order p? 
Does a group of order 35 contain an element of order 5? of order 7? 


A finite group contains an element x of order 10 and also‘an element y of order 6. What 
can be said about the order of G? 


Let y:G — G’ be a group homomorphism. Suppose that |G| = 18, |G’| = 15, and that 
¢ is not the trivial homomorphism. What is the order of the kernel? 


8.7. 


8.8. 


8.9. 


8.10. 


8.11. 


8.12. 


8.13. 
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A group G of order 22 contains elements x and y, where x #1 and y is not a power of x. 
Prove that the subgroup generated by these elements is the whole group G. 


Let G be a group of order 25. Prove that G has at least one subgroup of order 5, and that 
if it contains only one subgroup of order 5, then it is a cyclic group. 


Let G be a finite group. Under what circumstances is the map g:G — G defined by 
g(x) = x? an automorphism of G? 


Prove that every subgroup of index 2 is a normal subgroup, and show by example that a 
subgroup of index 3 need not be normal. 


Let G and H be the following subgroups of GL2(R): 


wT ey | an) 
Pail: Ae ie utp 
with x and y real and x > 0. An element of G can be represented by a point in the right 


half plane. Make sketches showing the partitions of the half plane into left cosets and into 
right cosets of H. 


Let S be a subset of a group G that contains the identity element 1, and such that the left 
cosets aS, with a in G, partition G. Prove that S is a subgroup of G. 


Let S be a set with a law of composition. A partition IT; UII2 U--- of S is compatible 
with the law of composition if for all i and j, the product set 


Ti; TN; = {xy | x € TI, y € Mj} 
is contained in a single subset I, of the partition. 


(a) The set Z of integers can be partitioned into the three sets [Pos], [Neg], [{0}]. Discuss 
the extent to which the laws of composition + and X are compatible with this 
partition. 

(b) Describe all partitions of the integers that are compatible with the operation +. 


Section9 Modular Arithmetic 


oF 
9.2. 
9.3. 
9.4. 
oD. 


9.6. 


OFT. 


For which integers n does 2 have a multiplicative inverse in Z/Zn? 

What are the possible values of a” modulo 4? modulo 8? 

Prove that every integer a is congruent to the sum of its decimal digits modulo 9. 
Solve the congruence 2x =5 modulo 9 and modulo 6. 


Determine the integers n for which the pair of congruences 2x — y=1 and 4x + 
3y=2 modulo x has a solution. 

Prove the Chinese Remainder Theorem: Let a, b, u, v be integers, and assume that the 
greatest common divisor of a and b is 1. Then there is an integer x such that x = u modulo 
aand x=v modulo b. 

Hint: Do the case u = 0 and v = 1 first. 


; al 
Determine the order of each of the matrices A = ; | and B= li A when the 
matrix entries are interpreted modulo 3. 
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Section 10 The Correspondence Theorem 


10.1. 
10.2. 


10.3. 


10.4. 


10.5. 


Describe how to tell from the cycle decomposition whether a permutation is odd or even. 


Let H and K be subgroups of a group G. 


(a) Prove that the intersection xH  yK of two cosets of H and K is either empty or 
else is a coset of the subgroup HN K. 


(b) Prove that if H and K have finite index in G then HN K also has finite index in G. 


Let G and G’ be cyclic groups of orders 12 and 6, generated by elements x and y, 
respectively, and let gy: G —> G’ be the map defined by g(x') = y!. Exhibit the 
correspondence referred to in the Correspondence Theorem explicitly. 


With the notation of the Correspondence Theorem, let H and H’ be corresponding 
subgroups. Prove that [G: H] = [G’: H’. 


With reference to the homomorphism S4 -> 5S3 described in Example 2.5.13, determine 
the six subgroups of 4 that contain K. 


Section 11 Product Groups 


11.1. 


11.2. 


11,3. 
11.4. 


LES: 


11.6. 


11.9. 


Let x be an element of order r of a group G, and let y be an element of G’ of order s. 
What is the order of (x, y) in the product group G X G’? 


What does Proposition 2.11.4 tell us when, with the usual notation for the symmetric 
group 53, K and H are the subgroups < y> and <x >? 


Prove that the product of two infinite cyclic groups is not infinite cyclic. 

In each of the following cases, determine whether or not G is isomorphic to the product 
group HX K. 

(a) G=R*, H = {+1}, K = {positive real numbers}. 


(b) G = {invertible upper triangular 2 x 2 matrices}, H = {invertible diagonal matrices}, 
= {upper triangular matrices with diagonal entries 1}. 


(c) G=C*%, H = {unit circle}, K = {positive real numbers}. 
Let G, and G2 be groups, and let Z; be the center of G;. Prove that the center of the 
product group Gj X G2 is Z; X Zp. 


Let G be a group that contains normal subgroups of orders 3 and 5, respectively. Prove 
that G contains an element of order 15. 


- Let H be a subgroup of a group G, let ¢:G — H be a homomorphism whose restriction 


to H is the identity map, and let N be its kernel. What can one say about the product 
map HX N > G? 


8. Let G, G’, and A be groups. Establish a bijective correspondence between homomor- 


phisms ©: H + G XG’ from A to the product group and pairs (@, ¢’) consisting of a 
homomorphism g: H + G anda homomorphism g’: H > G’. 


Let H and K be subgroups of a group G. Prove that the product set HK is a subgroup 
of G if and only if HK = KH. 


Section 12 Quotient Groups 


12.1. 


Show that if a subgroup H of a group G is not normal, there are left cosets aH and bH 
whose product is not a coset. 
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12.2, In the general linear group GL3(R), consider the subsets 


Oro 1 OeOd 


where * represents an arbitrary real number. Show that H is a subgroup of GL3, that K 


is a normal subgroup of H, and identify the quotient group H/K. Determine the center 
of H. 


12.3. Let P be a partition of a group G with the property that for any pair of elements A, B of 
the partition, the product set A B is contained entirely within another element C of the 
partition. Let N be the element of P that contains 1. Prove that N is a normal subgroup 
of G and that P is the set of its cosets. 


12.4. Let H = {+1, +i} be the subgroup of G = C% of fourth roots of unity. Describe the 
cosets of H in G explicitly. Is G/ H isomorphic to G? 


ia oe 10 x 
H=)0 1 *], and K=/}0 1 Oo], 


12.5. Let G be the group of upper triangular real matrices , with a and d different 


a | 
0d 
from zero. For each of the following subsets, determine whether or not S is a subgroup, 
and whether or not S is a normal subgroup. If S is a normal subgroup, identify the 
quotient group G/S. 

(i) Sis the subset defined by b = 0. 

(ii) Sis the subset defined by d = 1. 


(iii) S is the subset defined by a = d. 


Miscellaneous Problems 


M.1. Describe the column vectors (a, c)‘ that occur as the first column of an integer matrix.A 
whose inverse is also an integer matrix. 

M.2. (a) Prove that every group of even order contains an element of order 2. 
(b) Prove that every group of order 21 contains an element of order 3. 


M.3. Classify groups of order 6 by analyzing the following three cases: 


(i) G contains an element of order 6. 
(ii) G contains an element of order 3 but none of order 6. 
(iii) All elements of G have order 1 or 2. 


M.4. A semigroup S is a set with an associative law of composition and with an identity. 
Elements are not required to have inverses, and the Cancellation Law need not hold. A 
semigroup S is said to be generated by an element s if the set {1, s, s*, ...} of nonnegative 
powers of s is equal to S. Classify semigroups that are generated by one element. 

M.5. Let S be a finite semigroup (see Exercise M.4) in which the Cancellation Law 2.2.3 holds. 
Prove that S is a group. 

*M.6. Let a = (a,,...,a,) and b = (bj, ..., by) be points in k-dimensional space R*. A 
path from a to b is a continuous function on the unit interval [0, 1] with values in R*,a 
function X :[0, 1] > R*, sending tm X(t) = (x1 (0), ..., xz (0), such that X(0) = a and 
X(1) = b. If S is a subset of R* and if a and b are in S, define a~ b if a and b can be 
joined by a path lying entirely in S. 
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*M.7. 


*M.8. 


M.9. 


M.10. 


*M.11. 


M.12. 


M.13. 


M.14. 


(a) Show that ~ is an equivalence relation on S. Be careful to check that any paths you 
construct stay within the set S. 


(b) A subset S is path connected if a ~ b for any two points a and b in S. Show that every 
subset S is partitioned into path-connected subsets with the property that two points 
in different subsets cannot be connected by a path in S. 


(c) Which of the following loci in R2 are path-connected: ix? Sat oy = Ue 
eel We 


The set of n Xn matrices can be identified with the space R””. Let G be a subgroup of 
GL, (R). With the notation of Exercise M.6, prove: 


(a) If.A, B, C, D are in G, and if there are paths in G from A to B and from C to D, then 
there is a path in G from AC to BD. 


(b) The set of matrices that can be joined to the identity J forms a normal subgroup of 
G. (It is called the connected component of G.) 


(a) The group SL,,(R) is generated by elementary matrices of the first type (see 
Exercise 4.8). Use this fact to prove that SL, (R) is path-connected. 


(b) Show that GL,,(R) is a union of two path-connected subsets, and describe them. 


(double cosets) Let H and K be subgroups of a group G, and let g be an element of G. 
The set HgK = {x € G | x = hgk for some h € H,k € K} is called a double coset. Do 
the double cosets partition G? 


Let H be a subgroup of a group G. Show that the double cosets (see Exercise M.9) 
HgH = {highz|hi, h2 Ss H} 


are the left cosets gH if and only if H/ is normal. 


Most invertible matrices can be written as a product A = LU of a lower triangular matrix 
L and an upper triangular matrix U, where in addition all diagonal entries of U are 1. 


(a) Explain how to compute L and U when the matrix A is given. 
(b) Prove uniqueness,-that there is at most one way to write A as such a product. 


(c) Show that every invertible matrix can be written as a product LPU, where L, U are 
as above and P is a permutation matrix. 


(d) Describe the double cosets L gU (see Exercise M.9). 


(postage stamp problem) Let a and b be positive, relatively prime integers. 


(a) Prove that every sufficiently large positive integer n can be obtained as ra + sb, 
where r and s are positive integers. 


(b) Determine the largest integer that is not of this form. 


(a game) The starting position is the point (1, 1), and a permissible ‘“‘move”’ replaces a 
point (a, b) by one of the points (a + b, b) or (a, a+ b). So the position after the first 
move will be either (2, 1) or (1, 2). Determine the points that can be reached. 


(generating SL2(Z)) Prove that the two matrices 


ef ]e-[ 4 


M.15. 


M.16. 
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generate the group SL2(Z) of all integer matrices with determinant 1. Remember that 
the subgroup they generate consists of all elements that can be expressed as products 
using the four elements E, E’, E~!, E”™!. 


Hint: Do not try to write a matrix directly as a product of the generators. Use row 
reduction. 


(the semigroup generated by elementary matrices) Determine the semigroup S (see 
Exercise M.4) of matrices A that can be written as a product, of arbitrary length, each of 
whose terms is one of the two matrices 


foi] [i 3] 


Show that every element of S can be expressed as such a product in exactly one way. 


‘(the homophonic group: a mathematical diversion) By definition, English words have 
the same pronunciation if their phonetic spellings in the dictionary are the same. The 
homophonic group H is generated by the letters of the alphabet, subject to the following 
relations: English words with the same pronunciation represent equal elements of the 
group. Thus be = bee, and since H is a group, we can cancel be to conclude that e = 1. 
Try to determine the group H. 


1] jearned this problem from a paper by Mestre, Schoof, Washington and Zagier. 
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Vector Spaces 


Immer mit den einfachsten Beispielen anfangen. 


—David Hilbert 


3.1 SUBSPACES OF R” 


Our basic models of vector spaces, the topic of this chapter, are subspaces of the space R” of 
n-dimensional real vectors. We discuss them in this section. The definition of a vector space 
is given in Section 3.3. 

Though row vectors take up less space, the definition of matrix multiplication makes 
column vectors more convenient, so we usually work with them. To save space, we sometimes 
use the matrix transpose to write a column vector in the form (a1, ..., @,)'. As mentioned 
in Chapter 1, we don’t distinguish a column vector from the point of R” with the same 
coordinates. Column vectors will often be denoted by lowercase letters such as v or w, and 
if v is equal to (aj, ..., dn)', we call (a,, ..., an)! the coordinate vector of v. 


We consider two operations on vectors: 


ay by ait by 
vector addition: - ft]: |= : , and 
(3.1.1) an Dn an + by 
ay Cay 
scalar multiplication: cj] : |= 
an Can 


These operations make R” into a vector space. 


A subset W of R” (3.1.1) is a subspace if it has these properties: 


(3.1.2) 


(a) If w and w’ are in W, then w + w’ isin W. 


Section 3.1 Subspaces of R” 79 


(b) If w isin W and cis in R, then cw is in W. 
(c) The zero vector is in W. 


There is another way to state the conditions for a subspace: 


(3.1.3) W is not empty, and if w;,..., w, are elements of W and Ci1,---, Cn are scalars, 
the linear combination c,w, +---+CyWrp is also in W. 


Systems of homogeneous linear equations provide examples. Given an m Xn matrix 
A with coefficients in R, the set of vectors in R”* whose coordinate vectors solve the 
homogeneous equation AX = 0 is a subspace, called the nullspace of A. Though this is very 
simple, we'll check the conditions for a subspace: 


e AX =Oand AY = Oimply A(X + Y) = 0: If X and Y are solutions, so is X¥ + Y. 
e AX = 0 implies AcX = 0: If X is a solution, so is cX. 
e AO = 0: The zero vector is a solution. 


The zero space W = {0} and the whole space W = R” are subspaces. A subspace is proper 
if it is not one of these two. The next proposition describes the proper subspaces of R?. 


Proposition 3.1.4 Let W be a proper subspace of the space R?, and let w be a nonzero 
vector in W. Then W consists of the scalar multiples cw of w. Distinct proper subspaces 
have only the zero vector in common. 


The subspace consisting of the scalar multiples cw of a given nonzero vector w is called the 
subspace spanned by w. Geometrically, it is a line through the origin in the plane R?. 


Proof of the proposition. We note first that a subspace W that is spanned by a nonzero 
vector w is also spanned by any other nonzero vector w’ that it contains. This is true 
because if w’ = cw with c+0, then any multiple aw can also be written in the form ac! w’. 
Consequently, if two subspaces W; and W) that are spanned by vectors w, and w2 have a 
nonzero element v in common, then they are equal. 

Next, a subspace W of R2, not the zero space, contains a nonzero element w}. Since 
W is a subspace, it contains the space W, spanned by wy, and if W; = W, then W consists 
of the scalar multiples of one nonzero vector. We show that if W is not equal to Wj, then it 
is the whole space R2. Let w2 be an element of W not in W,, and let W2 be the subspace 
spanned by wz. Since W; # W2, these subspaces intersect only in 0. So neither of the two 
vectors w, and w? is a multiple of the other. Then the coordinate vectors, call them Aj, of w; 
aren’t proportional, and the 22 block matrix A = [A;,|A2] with these vectors as columns has 
a nonzero determinant. In that case we can solve the equation AX = B for the coordinate 
vector B of an arbitrary vector v, obtaining the linear combination v = w1x, + w2X2. This 
shows that W is the whole space R?. . Oo 


It can also be seen geometrically from the parallelogram law for vector addition that 
every vector is a linear combination cj w1 + c2W2. 
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CjW] + C2W2 


C2W2 
Cjw] 


W] 


The description of subspaces of R? that we have given is clarified in Section 3.4 by the 
concept of dimension. 


3.2 FIELDS 


As mentioned at the beginning of Chapter 1, essentially all that was said about matrix 
operations is true for complex matrices as well as for real ones. Many other number systems 
serve equally well. To describe these number systems, we list the properties of the “‘scalars”’ 
that are needed, and are led to the concept of a field. We introduce fields here before turning 
to vector spaces, the main topic of the chapter. 

Subfields of the field C of complex numbers are the simplest fields to describe. A 
subfield of C is a subset that is closed under the four operations of addition, subtraction, 
multiplication, and division, and which contains 1. In other words, F is a subfield of C if it 
has these properties: 


(3.2.1) (4,-,X,+,1) 

¢ Ifaand bare in F, thena+ bisin F. 

e Ifaisin F, then -a is in F. 

¢ Ifaand bare in F, then ab is in F. 

¢ Ifaisin F anda<0, thena’! isin F. 

e Llisin F. 
These axioms imply that 1 — 1 = 0 is an element of F. Another way to state them is to say 
that F is a subgroup of the additive group Ct, and that the nonzero elements of F form a 
subgroup of the multiplicative group C*. 

Some examples of subfields of C: 
(a) the field R of real numbers, 
(b) the field Q of rational numbers (fractions of integers), 


(c) the field Q[/2] of all complex numbers of the form a + bV/2, with rational numbers 
a and b. 


The concept of an abstract field is only slightly harder to grasp than that of a subfield, 
and it contains important new classes of fields, including finite fields. 


Definition 3.2.2 A field F is a set together with two laws of composition 


FXF3F and FXFS3F 
called addition: a, b~»a + b and multiplication: a, b ~ ab, which satisfy these axioms: 


Section 3.2 Fields 81 


(i) Addition makes F into an abelian group F*; its identity element is denoted by 0. 


(it) Multiplication is commutative, and it makes the set of nonzero elements of F into an 
abelian group F'; its identity element is denoted by 1. 


(iii) distributive law: For all a, b, and cin F, a(b +c) = ab + ac. 


The first two axioms describe properties of the two laws of composition, addition and 
multiplication, separately. The third axiom, the distributive law, relates the two laws. 

You will be familiar with the fact that the real numbers satisfy these axioms, but the fact 
that they are the only ones needed for the usual algebraic operations can only be understood 
after some experience. 

The next lemma explains how the zero element multiplies. 


Lemma 3.2.3 Let F be a field. 


(a) The elements 0 and 1 of F are distinct. 
(b) For all ain F, a0 = 0 and 0a = 0. 
(c) Multiplication in F is associative, and 1 is an identity element. 


Proof. (a) Axiom (ii) implies that 1 is not equal to 0. 


(b) Since 0 is the identity for addition, 0 + 0 = 0. Then a0 + a0 = a(0 + 0) = a0. Since Ft 
is a group, we can cancel a0 to obtain a0 = 0, and then 0a = 0 as well. 


(c) Since F — {0} is an abelian group, multiplication is associative when restricted to this 
subset. We need to show that a(bc) = (ab)c when at least one of the elements is zero. In 
that case, (b) shows that the products in question are equal to zero. Finally, the element 1 is 
an identity on F — {0}. Setting a = 1 in (b) shows that 1 is an identity on all of F. 0 


Aside from subfields of the complex numbers, the simplest examples of fields are 
certain finite fields called prime fields, which we describe next. We saw in the previous 
chapter that the set Z/nZ of congruence classes modulo an integer 7 has laws of addition 
and multiplication derived from addition and multiplication of integers. All of the axioms 
for a field hold for the integers, except for the existence of multiplicative inverses. And as 
noted in Section 2.9, such axioms carry over to addition and multiplication of congruence 
classes. But the integers aren’t closed under division, so there is no reason to suppose that 
congruence classes have multiplicative inverses. In fact they needn’t. The class of 2, for 
example, has no multiplicative inverse modulo 6. It is somewhat surprising that when pisa 
prime integer, all nonzero congruence classes modulo p have inverses, and therefore the set 
Z/ pZis a field. This field is called a prime field, and is often denoted by Fp. 

Using bar notation and choosing the usual representative elements for the p congruence 


classes, 
(3.2.4) Fy = {0,1,..., p-1} = Z/pZ. 


Theorem 3.2.5 Let p be a prime integer. Every nonzero congruence class modulo p has a 
- multiplicative inverse, and therefore F, is a field of order p. 


We discuss the theorem before giving the proof. 
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If a and Db are integers, then a+0 means that p does not divide a, and ab = 1 means 
ab=1 modulo p. The theorem can be stated in terms of congruence in this way: 


Let p be a prime, and let a be an integer not divisible by p. 


(3.2.6) There is an integer b such that ab=1 modulo p. 


Finding the inverse of a congruence class @ modulo p can be done by trial and error if p is 
small. A systematic way is to compute the powers of a. If p = 13 anda = 3, then a = 9and 


@ = 27 =1. Weare lucky: a has order 3, and therefore 3 ' _ 3° — 9. On the other hand, 
the powers of 6 run through every nonzero congruence class modulo 13. Computing powers 
may not be the fastest way to find the inverse of 6. But the theorem tells us that the set Fs of 
nonzero congruence classes forms a group. So every element a of ye has finite order, and if 


@ has order r, its inverse will be a”). 
To make a proof of the theorem using this reasoning, we need the cancellation law: 


Proposition 3.2.7 Cancellation Law. Let p be a prime integer, and let a, b, and @ be 
elements of Fp. 


(a) Ifab =0, thena =0orb=0. 
(b) If a+0 and if ab = ZC, thenb =. 


Proof. (a) We represent the congruence classes @ and b by integers a and b, and we translate 
into congruence. The assertion to be proved is that if p divides ab then p divides a or p 
divides b. This is Corollary 2.3.7. 


(b) It follows from (a) that if 440 and a(b — 2) =0, then b—G=0O. O 


Proof of Theorem (3.2.5). Let a be a nonzero element of F,. We consider the powers 
1, a, a’, a, ... Since there are infinitely many exponents and only finitely many elements 
in F,, there must be two powers that are equal, say a” =a”, where m <n. We cancel a” 
from both sides: 1 = a"~™. Then a“) js the inverse of @. O 


It will be convenient to drop the bars over the letters in what follows, trusting 
ourselves to remember whether we are working with integers or with congruence classes, 
and remembering the rule (2.9.8): 


If a and b are integers, thena = bin F, means a=b modulo p. 


As with congruences in general, computation in the field F, can be done by working 
with integers, except that division cannot be carried out in the integers. One can ope- 
rate with matrices A whose entries are in a field, and the discussion of Chapter 1 can be 
repeated with no essential change. 

Suppose we ask for solutions of a system of n linear equations in n unknowns in 
the prime field F,. We represent the system of equations by an integer system, choosing 
representatives for the congruence classes, say AX = B, where A is ann Xn integer matrix 
and B is an integer column vector. To solve the system in Fp, we invert the matrix A 
modulo p. The formula cof(A)A = 6/, where 6 = det A (Theorem 1.6.9), is valid for integer 
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matrices, so it also holds in F, when the matrix entries are replaced by their congruence 
classes. If the congruence class of 5 isn’t zero, we can invert the matrix A in F, by computing 
5°! cof(A). 


Corollary 3.2.8 Let AX = B be a system of n linear equations in n unknowns, where the 
entries of A and B are in F,, and let 6 = det A. If 8 is not zero, the system has a unique 
solution in Fp. O 


Consider, for example, the system AX = B, where 


AO ero [Sip 


The coefficients are integers, so AX = B defines a system of equations in F p for any prime 
p. The determinant of A is 42, so the system has a unique solution in F, for all p that do 
not divide 42, i.e., all p different from 2, 3, and 7. For instance, det A = 3 when evaluated 
modulo 13. Since 37! = 9 in Fy3, 


a OF 3 |e eee 
A = [5 alls | and X¥=A B=| |. modulo 13 


The system has no solution in F2 or F3. It happens to have solutions in F7, though det A = 0 
modulo 7. 

Invertible matrices with entries in the prime field F, provide new examples of finite 
groups, the general linear groups over finite fields: 


GL) (Fp) = {n Xn invertible matrices with entries in Fp} 
SLy(Fp) = {n Xn matrices with entries in F, and with determinant 1} 


For example, the group of invertible 2 x 2 matrices with entries in F2 contains the six 
elements 


coy cual SLL HC AG ad} 


This group is isomorphic to the symmetric group S3. The matrices have been listed in an 
order that agrees with our usual list {1, x, x2, pe, xy} of the elements of S3. 


One property of the prime fields F, that distinguishes them from subfields of C is that 
adding 1 to itself a certain number of times, in fact p times, gives zero. The characteristic of 
a field F is the order of 1, as an element of the additive group F*, provided that the order 
is finite. It is the smallest positive integer m such that the sum 1 + --- +1 of m copies of 
1 evaluates to zero. If the order is infinite, that is, 1 + --- +1 is never 0 in F, the field is, 
somewhat perversely, said to have characteristic zero. Thus subfields of C have characteristic 
zero, while the prime field F,, has characteristic p. 


Lemma 3.2.10 The characteristic of any field F is either zero or a prime number. 
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Proof, To avoid confusion, we let 0 and 1 denote the additive and the multiplicative identities 
in the field F, respectively, and if k is a positive integer, we let k denote the sum of k copies 
of 1. Suppose that the characteristic m is not zero. Then 1 generates a cyclic subgroup H of 
F+ of order m, and m = 0. The distinct elements of the cyclic subgroup H generated by q 
are the elements k with k = 0,1, ..., m-1 (Proposition 2.4.2). Suppose that m isn’t ae 
san = rsewith 1 <7, s <on. Then 7 ¥ and 5 are in the multiplicative group F* = F — {0}, 
but the product 75, which is equal to 0, is not in F*. This contradicts the fact that F isa 
group. Therefore m must be prime. ‘ 


The prime fields F, have another remarkable property: 


Theorem 3.2.11 Structure of the Multiplicative Group. Let p be a prime integer. The 
multiplicative group a of the prime field is a cyclic group of order p — 1. 


We defer the proof of this theorem to Chapter 15, where we prove that the multiplicative 
group of every finite field is cyclic (Theorem 15.7.3). 
e A generator for the cyclic group F A is called a primitive root modulo p. 


There are two primitive roots modulo 7, namely 3 and 5, and four primitive roots 
modulo 11. Dropping bars, the powers 3°, 3!, 3”, .. . of the primitive root 3 modulo 7 list the 
nonzero elements of F 7 in the following order: 


(32,12) Fo = {1, 3, 2,6, 4, 5) = 322, - 1). 


Thus there are two ways to list the nonzero elements of F,,, additively and multiplica- 
tively. If @ is a primitive root modulo p, 


(3.2.13) Ky = {1, 2n8) ...., p-leMecat, a 


3.3 VECTOR SPACES 


Having some examples and the concept of a field, we proceed to the definition of a vector 
space. 


Definition 3.3.1 A vector space V over a field F is a set together with two laws of 

composition: 

(a) addition: VX V > V, written v, w~v+ w, for v and w in V, 

(b) scalar multiplication by elements of the field: F X V > V, written c, v~»cv, for c in 
F and vin V. 

These laws are required to satisfy the following axioms: 


* Addition makes V into a commutative group V‘, with identity denoted by 0. 
e lv =v, for all vin V. 
* associative law: (ab)v = a(bv), for alla and bin F and all vin V. 


¢ distributive laws: (a+ b)v =av+bv and a(u+ w)= = dvu+au, forall aand bin 
F and all v and w in V. 
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The space F” of column vectors with entries in the field F forms a vector space over F, 
when addition and scalar multiplication are defined as usual (3.1.1). 
Some more examples of real vector spaces (vector spaces over R): 


Examples 3.3.2 


(a) Let V = C be the set of complex numbers. Forget about multiplication of two complex 
numbers. Remember only addition @ + 6 and multiplication ra of a complex number a 
by a real number r. These operations make V into a real vector space. 


(b) The set of real polynomials p(x) = anx" + ---+ ag is a real vector space, with 
addition of polynomials and multiplication of polynomials by real numbers as its laws of 
composition. 

(c) The set of continuous real-valued functions on the real line is a real vector space, with 
addition of functions f + g and multiplication of functions by real numbers as its laws 
of composition. 

; c ¢ : 2 F 

(d) The set of solutions of the differential equation oy = -y is areal vector space. O 

Each of our examples has more structure than we look at when we view it as a vector space. 

This is typical. Any particular example is sure to have extra features that distinguish it from 

others, but this isn’t a drawback. On the contrary, the strength of the abstract approach lies 

in the fact that consequences of the axioms can be applied in many different situations. 


Two important concepts, subspace and isomorphism, are analogous to subgroups and 
isomorphisms of groups. As with subspaces of R”, a subspace W of a vector space V 
over a field F is a nonempty subset closed under the operations of addition and scalar 
multiplication. A subspace W is proper if it is neither the whole space V nor the zero 
subspace {0}. For example, the space of solutions of the differential equation (3.3.2)(d) is a 
proper subspace of the space of all continuous functions on the real line. 


Proposition 3.3.3. Let V = F” be the vector space of column vectors with entries in a field 
F.. Every proper subspace W of V consists of the scalar multiples {cw} of a single nonzero 
vector w. Distinct proper subspaces have only the zero vector in common. 


The proof of Proposition 3.1.4 carries over. O 


Example 3.3.4 Let F be the prime field F,. The space F’ 2 contains p? vectors, p? — 1 
of which are nonzero. Because there are p — 1 nonzero scalars, the subspace W = {cw} 
spanned by a nonzero vector w will contain p — 1 nonzero vectors. Therefore F * contains 


(p* — 1)/(p — 1) = p +1 proper subspaces. oO 


An isomorphism ¢ from a vector space V to a vector space V’, both over the same field 
F, is a bijective map g: V > V’ compatible with the two laws of composition, a bijective 
map such that 


(Ci an g(v+w)=9¢(v)+g(w) and ¢g(cv) =c¢y(v), 


for all v and w in V and all c in F. 
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Examples 3.3.6 


(a) Let F”*” denote the set of n Xn matrices with entries in a field F’. This set is a vector 
space over F’,, and it is isomorphic to the space of column vectors of length ne 

(b) If we view the set of complex numbers as a real vector space, as in (3.3.2)(a), the map 
gp: R* > C sending (a, b)' ~ a + bi is an isomorphism. 


3.4 BASES AND DIMENSION 


We discuss the terminology used when working with the operations of addition and scalar 
multiplication in a vector space. The new concepts are span, independence, and basis. 

We work with ordered sets of vectors here. We put curly brackets around unordered 
sets, and we enclose ordered sets with round brackets in order to make the distinction clear. 
Thus the ordered set (v, w) is different from the ordered set (w, v), whereas the unordered 
sets {u, w} and {w, v} are equal. Repetitions are allowed in an ordered set. So (v, v, w) is 
an ordered set, and it is different from (v, w), in contrast to the convention for unordered 
sets, where {v, v, w} and {v, w} denote the same sets. 


e Let V be a vector space over a field F’, and let S = (11, ..., Un) be an ordered set of 
elements of V. A linear combination of S is a vector of the form 


(3.4.1) W= Ci{Vji +-::+CnVn, with c; in F. 


It is convenient to allow scalars to appear on either side of a vector. We simply agree 
that if v is a vector and c is a scalar, then the notations vc and cv stand for the same vector, 
the one obtained by scalar multiplication. So vycy +--+ + Uney = C1 V1] +--+ + CnUp.- 

Matrix notation provides a compact way to write a linear combination, and the way we 
write ordered sets of vectors is chosen with this in mind. Since its entries are vectors, we call 
an array S = (v1,..., Un) a hypervector. Multiplication of two elements of a vector space 
is not defined, but we do have scalar multiplication. This allows us to interpret a product of 
the hypervector S$ and a column vector X in F'”, as the matrix product 


x] 
(3.4.2) SX = (U1,...5Un) | 2 | = upxy +--+ + UpXn. 


Xn 


Evaluating the right side by scalar multiplication and vector addition, we obtain another 
vector, a linear combination in which the scalar coefficients x; are on the right. 


We carry along the subspace W of R? of solutions of the linear equation 
(3.4.3) 2X1 — X2 — 2x3 =0, or AX =0, where A =(2, -1, -2) 


as an example. Two particular solutions w; and w2 are shown below, together with a linear 
combination wy, + w2y2. 


1 1 Yi 2 
(3.4.4) wiy=]O!1, we=/2!1, wryt+tur22= 2y2 
il 0 il 
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If we write S = (w1, w2) with w; as in (3.4.4) and Y = (y1, y2)’, then the combination 
W1y1 + W2y2 can be written in matrix form as SY. 


¢ The set of all vectors that are linear combinations of § = (v1, ..., Un) forms a subspace 
of V, called the subspace spanned by the set. 


As in Section 3.1, this span is the smallest subspace of V that contains S, and it will 
often be denoted by Span S. The span of a single vector (v;) is the space of scalar multiples 
Cv, of vy. 

One can define span also for an infinite set of vectors. We discuss this in Section 3.7. 
Let’s assume for now that the sets are finite. 


Lemma 3.4.5 Let S be an ordered set of vectors of V, and let W be a subspace of V. If 
SC W, then SpanS$ C W. O 


The column space of an m Xn matrix with entries in F is the subspace of F” spanned 
by the columns of the matrix. It has an important interpretation: 


Proposition 3.4.6 Let A be anm Xn matrix, and let B be a column vector, both with entries 
in a field F. The system of equations AX = B has a solution for X in F'” if and only if B is in 
the column space of A. 


Proof. Let Aj, ..., An denote the columns of A. For any column vector X = (x1,..., Xn)‘, 
the matrix product AX is the column vector A,x; +---+AnXn. This is a linear combination 
of the columns, an element of the column space, and if AX = B, then B is this linear 


combination. O 
A linear relation among vectors v1, ..., Up is any linear combination that evaluates to 

zero — any equation of the form 

(3.4.7) Vy xX, + V2X2 + +++ + UnXn = 0 


that holds in V, where the coefficients x; are in F’. A linear relation can be useful because, if 
Xn is not zero, the equation (3.4.7) can be solved for vy. 


Definition 3.4.8 An ordered set of vectors S = (vj,..., Un) iS independent, or linearly 
independent, if there is no linear relation SX = 0 except for the trivial one in which X = 0, 
i.e., in which all the coefficients x; are zero. A set that is not independent is dependent. 


An independent set S cannot have any repetitions. If two vectors v; and vu; of S are 
equal, then v; — vj = 0 is a linear relation of the form (3.4.7), the other coefficients being 
zero. Also, no vector v; in an independent set is zero, because if v; is zero, then vj = 0 is a 
linear relation. 


Lemma 3.4.9 


(a) A set (v1) of one vector is independent if and only if v; #0. 
(b) A set (v1, v2) of two vectors is independent if neither vector is a multiple of the other. 
(c) Any reordering of an independent set is independent. ie) 
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Suppose that V is the space F'” and that we know the coordinate vectors of the vectors 
in the set S = (v1, ..., Un). Then the equation SX = 0 gives us a system of m homogeneous 
linear equations in the n unknowns x;, and we can decide independence by solving this 
system. 


Example 3.4.10 Let S = (v1, v2, v3, v4) be the set of vectors in R? whose coordinate vectors 
are 


1 1 2 1 
(3.4.11) A, =|01; Ar] 2 AsSleeeaa ae 
1 0 2 3 


Let A denote the matrix made up of these column vectors: 


it 
(3.4.12) A=|0 2 
1 0 


Ne N 


1 
1 
3 


A linear combination will have the form SX = v1x, + v2xX2 + U3x3 + 04x4, and its coordinate 
vector will be AX = A,.x; + A2x2 + A3x3 + Aqgx4. The homogeneous equation AX = 0 has a 
nontrivial solution because it is a system of three homogeneous equations in four unknowns. 
So the set § is dependent. On the other hand, the determinant of the 3 x3 matrix A’ formed 
from the first three columns of (3.4.12) is equal to 1, so the equation A’X = 0 has only the 
trivial solution. Therefore (v1, v2, v3) is an independent set. fe] 


Definition 3.4.13 A basis of a vector space V is a set (v1,..., Uy) of vectors that is 
independent and also spans V. 


We will often use a boldface symbol such as B to denote a basis. The set (v1, v2, v3) 
defined above is a basis of R* because the equation A’X = B has a unique solution for all 
B (see 1.2.21). The set (w 1, wz) defined in (3.4.4) is a basis of the space of solutions of the 
equation 2x; — x2 — 2x3 = 0, though we haven’t verified this. 


Proposition 3.4.14 The set B = (v1, ..., Un) is a basis of V if and only if every vector w in 
V can be written in a unique way as a combination w = v1x1 +---+ UnxXn = BX. 


Proof. The definition of independence can be restated by saying that the zero vector can be 
written as a linear combination in just one way. If every vector can be written uniquely as a 
combination, then B is independent, and spans V, so it is a basis. Conversely, suppose that B 
is a basis. Then every vector w in V can be written as a linear combination. Suppose that w 
is written as a combination in two ways, say w = BX = BX’. Let Y = X — X’. Then BY = 0. 
This is a linear relation among the vectors vj, ..., Un, which are independent. Therefore 
X — X' = 0. The two combinations are the same. . CT] 


Let V = F'” be the space of column vectors. As before, e; denotes the column vector 
with 1 in the 7th position and zeros elsewhere (see (1.1.24)). The set E = (e1,..., en) is 
a basis for F” called the standard basis. If the coordinate vector of a vector v in F” is 
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a= ( 45s: soyagp)', thenipie BX —xeyx) +---+ €nXn is the unique expression for v in terms 
of the standard basis. 


We now discuss the main facts that relate the three concepts, of span, independence, 
and basis. The most important one is Theorem 3.4.18. 


Proposition 3.4.15 Let S = (vj, ..., un) be an ordered set of vectors, let w be any vector in 
V, and let S’ = (S, w) be the set obtained by adding w to S. 


(a) Span S = Span S’ if and only if w is in Span S. 
(b) Suppose that S is independent. Then S’ is independent if and only if w is not in Span S. 


Proof. This is very elementary, so we omit most of the proof. We show only that if S is 
independent but S” is not, then w is in the span of S. If S’ is dependent, there is some linear 
relation 


Vj{X1, +--+ + Unxn + wy = 0, 


in which the coefficients x;,..., x, and y are not all zero. If the coefficient y were zero, 
the expression would reduce to SX = 0, and since S is assumed to be independent, we could 
conclude that X¥ = 0 too. The relation would be trivial, contrary to our hypothesis. So y40, 
and then we can solve for w as a linear combination of v1, ..., Un. O 


e A vector space V is finite-dimensional if some finite set of vectors spans V. Otherwise, V 
is infinite-dimensional. 


For the rest of this section, our vector spaces are finite-dimensional. 


Proposition 3.4.16 Let V be a finite-dimensional vector space. 


(a) Let S be a finite subset that spans V, and let L be an independent subset of V. One can 
obtain a basis of V by adding elements of S to L. 

(b) Let S be a finite subset that spans V. One can obtain a basis of V by deleting elements 
from S. 


Proof. (a) If S is contained in Span L, then L spans V, and so it is a basis (3.4.5). If not, 
we choose an element v in S, which is not in Span L. By Proposition 3.4.15, L’ = (L, v) 
is independent. We replace L by L’. Since S is finite, we can do this only finitely often. So 
eventually we will have a basis. 


(b) If S is dependent, there is a linear relation v}c; +- -- + Uncn = 0in which some coefficient, 
say Cn, is not zero. We can solve this equation for up, and this shows that vy, is in the span of 
the set S; of the first n — 1 vectors. Proposition 3.4.15(a) shows that Span S = Span S;. So S, 
spans V. We replace S by S;. Continuing this way we must eventually obtain a family that is 
independent but still spans V: a basis. 


Note: There is a problem with this reasoning when V is the zero vector space {0}. Starting 
with an arbitrary set S of vectors in V, all equal to zero, our procedure will throw them 
out one at a time until there is only one vector v; left. And since v; is zero, the set (v1) 1s 
dependent. How can we proceed? The zero space isn’t particularly interesting, but it may 
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lurk in a corner, ready to trip us up. We have to allow for the possibility that a vector space 
that arises in the course of some computation, such as solving a system of homogeneous 
linear equations, is the zero space, though we aren’t aware of this. In order to avoid having 
to mention this possibility as a special case, we adopt the following definitions: 


(3.4.17) 
e The empty set is independent. 
e The span of the empty set is the zero space {0}. 


Then the empty set is a basis for the zero vector space. These definitions allow us to throw 
out the last vector v;, which rescues the proof. 0 


We come now to the main fact about independence: 


Theorem 3.4.18 Let S and L be finite subsets of a vector space V. Assume that S spans V 
and that L is independent. Then S contains at least as many elements as L does: |S| > |L|. 


As before, |S| denotes the order, the number of elements, of the set S. 


Proof. Say that S = (v,,..., Um) and that L = (w),..., w,). We assume that |S| < |Z}, 
1.e., that m <n, and we show that L is dependent. To do this, we show that there is a linear 
relation w 1x; +---+ WyX, = 0, in which the coefficients x; aren’t all zero. We write this 
undetermined relation as LX = 0. 


Because S spans V, each element wj; of L is a linear combination of S, say w; = 
Uj, j; +++: + Umamj = SAj, where A; is the column vector of coefficients. We assemble 
these column vectors into an m Xn matrix 


| | 


(3.4.19) A= Ay ae 

Then 

(3.4.20) SA (SAigs ..eSAp) = (Weeaesawa) = Il, 

We substitute SA for L into our undetermined linear combination: 
=(SA)X. 


The associative law for scalar multiplication implies that (SA)X = S(AX). The proof is the 
same as for the associative law for multiplication of scalar matrices (which we omitted). If 
AX = 0, then our combination LX will be zero too. Now since A is an m Xn matrix with 
m <n, the homogeneous system AX = 0 has a nontrivial solution X¥. Then LX = 0 is the 
linear relation we are looking for. O 


Proposition 3.4.21 Let V be a finite-dimensional vector space. 


(a) Any two bases of V have the same order (the same number of elements). 


(b) Let B be a basis. If a finite set § of vectors spans V, then |S| > |Bj, and |S| = |B| if and 
only if S is a basis. 
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(c) Let B be a basis. If a set L of vectors is independent, then |L| < |B|, and {ZL} = |B| if and 
only if L is a basis. 


Proof. (a) We note here that two finite bases B,; and B> have the same order, and we will 
show in Corollary 3.7.7 that every basis of a finite-dimensional vector space is finite. Taking 
S = B; and L = Bp» in Theorem 3.4.18 shows that |B;| > |Bo|, and similarly, |Bz| > |B,|. 
Parts (b) and (c) follow from (a) and Proposition 3.4.16. O 


Definition 3.4.22 The dimension of a finite-dimensional vector space V is the number of 
vectors in a basis. This dimension will be denoted by dim V. 


The dimension of the space F” of column vectors is n because the standard basis E = 
(€1,..., €n) contains n elements. 


Proposition 3.4.23 If W is a subspace of a finite-dimensional vector space V, then W is 
finite-dimensional, and dim W < dim V. Moreover, dim W = dim V if and only if W = V. 


Proof. We start with any independent set L of vectors in W, possibly the empty set. If L 
doesn’t span W, we choose a vector w in W not in the span of L. Then L’ = (L, w) will be 
independent (3.4.15). We replace L by L’. 


Now it is obvious that if L is an independent subset of W, then it is also independent 
when thought of as a subset of V. So Theorem 3.4.18 tells us that |L| < dim V. Therefore 
the process of adding elements to L must come to an end, and when it does, we will have a 
basis of W. Since L contains at most dim V elements, dim W < dim V. If |L| = dim V, then 
Proposition 3.4.21(c) shows that L is a basis of V, and therefore W = V. O 


3.5 COMPUTING WITH BASES 


The purpose of bases is to provide a method of computation, and we learn to use them in 
this section. We consider two topics: how to express a vector in terms of a basis, and how to 
relate different bases of the same vector space. 

Suppose we are given a basis B = (vj, ..., Un) of a vector space V over F’. Remember: 
This means that every vector v in V can be expressed as a combination 


(631) , V=UjX, +--+ UnXn, with x; in F, 
in exactly one way (3.4.14). The scalars x; are the coordinates of v, and the column vector 
My 
Xn 
is the coordinate vector of v, with respect to the basis B. . . 
For example, (cos f, sin £) is a basis of the space of solutions of the differential equation 
y” = -y. Every solution of this differential equation is a linear combination of this basis. If 


we are given another solution f(t), the coordinate vector (x;, X2)' of f is the vector such 
that f(t) = (cos t)x; + (sin f)x2. Obviously, we need to know something about f to find X. 
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Not very much: just enough to determine two coefficients. Most properties of f are implicit 
in the fact that it solves the differential equation. 

What we can always do, given a basis B of a vector space of dimension n, is to define 
an isomorphism of vector spaces (see 3.3.5) from the space F'” to V: 


(35:3) w:F” + V thatsends X~~ BX. 
We will often denote this isomorphism by B, because it sends a vector X to BX. 


Proposition 3.5.4 Let S = (v1, ..., Un) be a subset of a vector space V,andlet wy: F”" > V 
be the map defined by w(X) = SX. Then 


(a) w is injective if and only if S is independent, 
(b) yris surjective if and only if S spans V, and 
(c) wis bijective if and only if S is a basis of V. 


This follows from the definitions of independence, span, and basis. O 


Given a basis, the coordinate vector of a vector v in V is obtained by inverting the map 
w (3.5.3). We won’t have a formula for the inverse function unless the basis is given more 
explicitly, but the existence of the isomorphism is interesting: 


Corollary 3.5.5 Every vector space V of dimension n over a field F' is isomorphic to the 
space F'” of column vectors. O 


Notice also that F” is not isomorphic to F’” when m+n, because F” has a basis of n 
elements, and the number of elements in a basis depends only on the vector space. Thus the 
finite-dimensional vector spaces over a field F are completely classified. The spaces F'” of 
column vectors are representative elements for the isomorphism classes. 


The fact that a vector space of dimension n is isomorphic to F” will allow us to 
translate problems on vector spaces to the familiar algebra of column vectors, once a basis 
is chosen. Unfortunately, the same vector space V will have many bases. Identifying V with 
the isomorphic space F’” is useful when a natural basis is in hand, but not when a basis is 
poorly suited to a given problem. In that case, we will need to change coordinates, i.e., to 
change the basis. 

The space of solutions of a homogeneous linear equation AX = 0, for instance, almost 
never has a natural basis. The space W of solutions of the equation 2x, — x2 — 2x3 = 0 
has dimension 2, and we exhibited a basis before: B = (w1, w2), where w, = (1, 0, 1)! and 
w2 = (1, 2,0)’ (see (3.4.4)). Using this basis, we obtain an isomorphism of vector spaces 
R? — W that we may denote by B. Since the unknowns in the equation are labeled x;, we 
need to choose another symbol for variable elements of R? here. We’ll use Y = ( y1, y2)'. 
The isomorphism B sends Y to the coordinate vector of BY = w,y, + wy that was 
displayed in (3.4.4). 

However, there is nothing very special about the two particular solutions w, and wy». 
Most other pairs of solutions would serve just as well. The solutions Ww} = (0/2, -1)) ad 
ws = (1, 4,-1)' give us a second basis B’ = (w;, w)) of W. Either basis suffices to express 
the solutions uniquely. A solution can be written in either one of the forms 
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B gia 9) yr 


2 
(65.6) 2y2 or 2y4 4 4y, 
¥1 tiga a) 
Change of Basis 
Suppose that we are given two bases of the same vector space V, say B = (v1, ..., Un) and 
B’ = (v|,..., v;,). We wish to make two computations. We ask first: How are the two bases 


related? Second, a vector v in V will have coordinates with respect to each of these bases, 
but they will be different. So we ask: How are the two coordinate vectors related? These are 
the basechange computations, and they will be very important in later chapters. They can 
also drive you nuts if you don’t organize the notation carefully. 

Let’s think of B as the old basis and B’ as a new basis. We note that every vector of the 
new basis B’ is a linear combination of the old basis B. We write this combination as 


(355-7) U5 = U1 Pij + V2prj + +++ + Un Pnj- 


The column vector P; = (p1j,-..., Pnj)' is the coordinate vector of the new basis vector 
Vis when it is computed using the old basis. We collect these column vectors into a square 


matrix P, obtaining the matrix equation B’ = BP: 
(3.5.8) Bs) ys - Bn) P = BR 


The jth column of P is the coordinate vector of the new basis vector U; with respect to the 


old basis. This matrix P is the basechange matrix. } 


Proposition 3.5.9 


(a) Let B and B’ be two bases of a vector space V. The basechange matrix P is an invertible 
matrix that is determined uniquely by the two bases B and B’. 

(b) Let B = (14, ..., U,) be a basis of a vector space V. The other bases are the sets of the 
form B’ = BP, where P can be any invertible n Xn matrix. 


Proof. (a) The equation B’ = BP expresses the basis vectors vu, as linear combinations of 
the basis B. There is just one way to do this (3.4.14), so P is unique. To show that P is 
an invertible matrix, we interchange the roles of B and B’. There is a matrix Q such that 
B = B’Q. Then 


B = BO = BPO, or (v},...,Un) = (Y,..-, Un) PO 


This equation expresses each v; as a combination of the vectors (v1, ..., U,). The entries 
of the product matrix PQ are the coefficients. But since B is a basis, there is just one way to 


1 This basechange matrix is the inverse of the one that was used in the first edition. 
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write v; as a combination of (v1, ..., Un), namely v; = v;, or in matrix notation, B = BI. So 
ROT: 


(b) We must show that if B is a basis and if P is an invertible matrix, then B’ = BP is also a 
basis. Since P is invertible, B = B’/P™!. This tells us that the vectors v; are in the span of B’. 
Therefore B’ spans V, and since it has the same number of elements as B, it is a basis. CJ 


Let X and X’ be the coordinate vectors of the same arbitrary vector v, computed with 
respect to the two bases B and B’, respectively, that is, v = BX and v = B’X’. Substituting 
B = BP! gives us the matrix equation 


(3.5.10) v=BX=BP'X. 


This shows that the coordinate vector of v with respect to the new basis B’, which we call X’, 
is P-'X. We can also write this as X = PX’. 

Recapitulating, we have a single matrix P, the basechange matrix, with the dual 
properties 


(3.5.11) B =BP and PX'=X, 


where X and X’ denote the coordinate vectors of the same arbitrary vector v, with respect 
to the two bases. Each of these properties characterizes P. Please take note of the positions 
of P in the two relations. 


Going back once more to the equation 2x; — x2 — 2x3 = 0, let B and B’ be the bases 
of the space W of solutions described above, in (3.5.6). The basechange matrix solves the 
equation 


6 4 ae <a 
o | Sal oe 2 ae Rae It is el |: 
eee 170 P21 P22 1 2 


The coordinate vectors Y and Y’ of a given vector v with respect to these two bases, the ones 
that appear in (3.5.6), are related by the equation 


/ a | | 
Vi —= 1{= = 7. 
1 ae 2 


Another example: Let B be the basis (cos ¢, sin ¢) of the space of solutions of the differential 


2) : : 
equation uy = -y. If we allow complex valued functions, then the exponential functions 
e*" = costtisint are also solutions, and B’ = (e!’, e7!") is a new basis of the space of 


solutions. The basechange computation is 


(3.5.12) Ce Ee rn . 


One case in which the basechange matrix is easy to determine is that V is the space 
F” of column vectors, the old basis is the standard basis E = (e;,...,@,), and the new 
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basis, we'll denote it by B = (v1, ..., Un) here, is arbitrary. Let the coordinate vector of vj, 
with respect to the standard basis, be the column vector B;. So v; = EB;. We assemble these 
column vectors into an n Xn matrix that we denote by [B]: | 
5.13) 


[B]= By ea Bn . Then (v4, ..., Un) = 4a, us, Gn) By eo) Bn ’ 


| | | | 


1.e., B = E[B]. Therefore [B] is the basechange matrix from the standard basis E to B. 


3.6 DIRECT SUMS 


The concepts of independence and span of a set of vectors have analogues for subspaces. 


If W,,..., Wx are subspaces of a vector space V, the set of vectors v that can be written 
as asum 
(3.6.1) v= U,+-:-+ UK, 


where w; is in W; is called the sum of the subspaces or their span, and is denoted by 
Vie Wx: 


(3.6.2) W,+---+Wr,={ve Viv=w +-+-+ wz, with w; in W;}. 


The sum of the subspaces is the smallest subspace that contains all of the subspaces 
W,,..., Wx. It is analogous to the span of a set of vectors. 

The subspaces W;, ..., W, are called independent if no sum w, +---+ wz with w; in 
W; is zero, except for the trivial sum, in which w; = 0 for all 7. In other words, the spaces are 
independent if 


(3.6.3) wy +-+-+w, =0, with w; in W;, implies w; = 0 for all i. 
Note: Suppose that vj,..., vg are elements of V, and let W; be the span of the vector 
v;. Then the subspaces W;,..., W, are independent if and only if the set (vj, ..., ug) is 


independent. This becomes clear if we compare (3.4.8) and (3.6.3). The statement in terms 
of subspaces is actually the neater one, because scalar coefficients don’t need to be put in 
front of the vectors w; in (3.6.3). Since each of the subspaces W; is closed under scalar 
multiplication, a scalar multiple cw; is simply another element of Wj. 0 


We omit the proof of the next proposition. 


Proposition 3.6.4 Let W;,..., Wx be subspaces of a finite-dimensional vector space V, and 
let B; be a basis of Wj. 
(a) The following conditions are equivalent: 
e The subspaces W; are independent, and the sum W, + --- + W, is equal to V. 
° The set B = (Bj, ..., B;) obtained by appending the bases B; is a basis of V. 
(b) dim(W, +---+ Wy) < dim W; +--- + dim W;, with equality if and only if the spaces 
are independent. 


96 Chapter 3 Vector Spaces 


(c) If Wii is a subspace of W; fori = 1, ..., k, andif the spaces Wj, ..., W, are independent, 
then so are the W!,..., WwW, —— as) 


If the conditions of Proposition 3.6.4(a) are satisfied, we say that V is the direct sum of 
W,,..-, Wx, and we write V = W; ©--- ® W;,: 


V=W,9e---OwW,, ifWi+---+Wr=V 


(3.6.5) _and W,,..., Wx are independent. 


If V is the direct sum, every vector v in V can be written in the form (3.6.1) in exactly one 
way. 


Proposition 3.6.6 Let W, and W be subspaces of a finite-dimensional vector space V. 


(a) dim W, + dim W2 = dim(W,N W2) + dim(W,; + W2). 

(b) W, and W)2 are independent if and only if W; ON W2 = {0}. 

(c) Vis the direct sum W; ® W) if and only if WW; W>2 = {0} and W; + W2 = V. 
(d) If W; + W2 = V, there is a subspace W), of W2 such that W; ® W, = V. 


Proof. We prove the key part (a): We choose a basis, U = (v1, ..., Ux) for W1N W2, and we 
extend it to a basis (U, V) = (41, ...,Ux5 V1,..--, Um) Of W,. We also extend U to a basis 
(U, W) = (u,..., Ug; Wi, ..., Wn) Of Wo. Then dim(W; 1 W2) = k, dim W, =k+m, 
and dim W2 =k « n. The assertion will follow if we prove that the set of k+m +n elements 
(U, V, W) = (m4, ...., Ug; U1, ---, Um} Wi, -.-, Wn) iS a basis of W; + Wo. 

We must a that (U, V, Ww) iS _— and spans W; + W2. An element v of 
W, + W> has the form w’ + w” where w’ is in W, and w” is in W>. We write w’ in terms of 
our basis (U, V) for Wj, say w’ =UX + VY =ujyx, +--+ +Ugxe ty, +--+: +Umy¥m. We 
also write w” as a combination UX’ + WZ of our basis (U, W) for W>. Then v = w’+ w” = 
U(X 4+ X)+ VY 4 WZ. 

Next, suppose we are given a linear relation UX + VY + WZ = 0, among the elements 
(U, V, W). We write this as UX + VY = -WZ. The left side of this equation is in W; and the 
right side is in W2. Therefore -WZ is in W; MN W3, and so it is a linear combination UX’ of the 
basis U. This gives us an equation UX’ + WZ = 0. Since the set (U, W) is a basis for W, it is 
independent, and therefore X’ and Z are zero. The given relation reduces to UX + VY = 0. 
But (U, V) is also an independent set. So X and Y are zero. The relation was trivial. 0 


3.7. INFINITE-DIMENSIONAL SPACES 


Vector spaces that are too big to be spanned by any finite set of vectors are called infinite- 
dimensional. We won’t need them very often, but they are important in analysis, so we 
discuss them briefly here. 


One of the simplest examples of an infinite-dimensional space is the space R© of 
infinite real row vectors 


(3.7.1) (4). = (Gii5.439.03.uuads 


An infinite vector can be thought of as a sequence a), a2, ... of real numbers. 
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The space R® has many infinite-dimensional subspaces. Here are a few; you will be 
able to make up some more: 


Examples 3.7.2 


(a) Convergent sequences: C = {(a) € R® | the limit lim a, exists }. 
nw 
(b) Absolutely convergent series: £' = {(a) « R® | 3 |an| < co}. 
1 


(c) Sequences with finitely many terms different from zero. 
Z = {(a) € R™ | a, = 0 for all but finitely many n}. 


Now suppose that V is a vector space, infinite-dimensional or not. What do we mean 
by the span of an infinite set S of vectors? It isn’t always possible to assign a value to 
an infinite combination cv; + c2v2 + ---. If V is the vector space R”, then a value can 
be assigned provided that the series c,v; + cov2 + --- converges. But many series don’t 
converge, and then we don’t know what value to assign. In algebra it is customary to speak 
only of combinations of finitely many vectors. The span of an infinite set S is defined to be 
the set of the vectors v that are combinations of finitely many elements of S: 


(3.7.3) V=C1V;+---+c;v;, Where v1,...,vU; areinS. 


The vectors v; in S can be arbitrary, and the number r is allowed to depend on the vector v 
and to be arbitrarily large: 


finite combinations 
sess) Saas = of elements of § 
For example, let e; = (0,...,0,1,0,...) be the row vector in R® with 1 in the ith 
position as its only nonzero coordinate. Let E = (e}, €2, €3, ...) be the set of these vectors. 
This set does not span R™, because the vector 


= (de eee 


is not a (finite) combination. The span of the set E is the subspace Z (3.7.2)(c). 
A set S, finite or infinite, is independent if there is no finite linear relation 


(3.7.5) CjU, +-:-+c-v, =0, with v1,...,v, in S, 


except for the trivial relation in which c; = --- = cy = 0. Again, the number r is allowed to 
be arbitrary, that is, the condition has to hold for arbitrarily large r and arbitrary elements 
Vip... pupPOmRseForexamplesthesewS = (wye,,€2, €3, .. .) is independent, if w and e; are 
the vectors defined above. With this definition of independence, Proposition 3.4.15 continues 
to be true. 

As with finite sets, a basis S of V is an independent set that spans V. The set 
S = (e€, €2,...) is a basis of the space Z. The monomials x’ form a basis for the space 
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of polynomials. It can be shown, using Zorn’s Lemma or the Axiom of Choice, that every 
vector space V has a basis (see the appendix, Proposition A.3.3). However, a basis for R® 
will have uncountably many elements, and cannot be made very explicit. 

Let us go back for a moment to the case that our vector space V is finite-dimensional 
(3.4.16), and ask if there can be an infinite basis. We saw in (3.4.21) that any two finite bases 
have the same number of elements. We complete the picture now, by showing that every 
basis is finite. This follows from the next lemma. 


Lemma 3.7.6 Let V be a finite-dimensional vector space, and let S be any set that spans V. 
Then S contains a finite subset that spans V. 


Proof. By hypothesis, there is a finite set, say (U1, ..., Um), that spans V. Because S spans 
V, each of the vectors u; is a linear combination of finitely many elements of S. The elements 
of S that we use to write all of these vectors as linear combinations make up a finite subset 
5S’ of S. Then the vectors u; are in Span S’, and since (u;,...,Um) spans V,sodoes S’.  O 


Corollary 3.7.7 Let V be a finite-dimensional vector space. 


e Every basis is finite. 
e Every set S$ that spans V contains a basis. 
¢ Every independent set L is finite, and can be extended to a basis. CO 


| don’t need to learn 8 + 7: I'll remember 8 + 8 and subtract 1. 
—T. Cuyler Young, Jr. 


EXERCISES 


Section2 Fields 


2.1. Prove that the numbers of the form a + b./2, where a and b are rational numbers, form a 
subfield of C. 


2.2. Find the inverse of 5 modulo p, for p = 7, 11, 13, and 17. 
2.3. Compute the product polynomial (x? + 3x? + 3x + 1)(x4 + 4x3 + 6x? + 4x + 1) when the 
coefficients are regarded as elements of the field F7. Explain your answer. 


2.4. Consider the system of linear equations = hs. 
2. 6 eee 1 


(a) Solve the system in F, when p = 5, 11, and 17. 
(b) Determine the number of solutions when p = 7. 


2.5. Determine the primes p such that the matrix 


Pez U 
Ags ea 
PU 


is invertible, when its entries are considered to be in F). 


2.6. 


2.7. 


2.8. 


2.9. 
2.10. 


2.11. 
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Solve completely the systems of linear equations AX = 0 and AX = B, where 


(jinQ, (b)inF,, (cin F3, (d)inF 7. 


By finding primitive elements, verify that the multiplicative group ES is cyclic for all primes 
pix 20) 


Let p be a prime integer. 


(a) Prove Fermat's Theorem: For every integer a, a? =a modulo p. 
(b) Prove Wilson’s Theorem: (p — 1)!=-1(modulo p). 


Determine the orders of the matrices : and ? 1 in the group GL(F7). 


Interpreting matrix entries in the field Fz, prove that the four matrices 


0 0 it ih il 0 1 
[; Re eg ak {| form a feta 


Hint. You can cut the work down by using the fact that various laws are known to hold for 
addition and multiplication of matrices. 


Prove that the set of symbols {a + bi | a, b € F3} forms a field with nine elements, if the 
laws of composition are made to mimic addition and multiplication of complex numbers. 
Will the same method work for Fs? For F7? Explain. 


Section3 Vector Spaces 


3.1. 


S25 


(a) Prove that the scalar product of a vector with the zero element of the field F is the 
zero vector. 


(b) Prove that if w is an element of a subspace W, then -w is in W too. 

Which of the following subsets is a subspace of the vector space F””” of n Xn matrices 
with coefficients in F’? 

(a) symmetric matrices (A = A‘), (b) invertible matrices, (c) upper triangular matrices. 


Section 4 Bases and Dimension 


4.1. 
4.2. 


4.3. 
4.4, 


4.5. 


Find a basis for the space of n Xn symmetric matrices (A’ = A). 
Let W C R‘ be the space of solutions of the system of linear equations AX = 0, where 


Al pee : ; 
=|: 1 3 g |: Find a basis for W. 


Prove that the three functions x’, cos x, and e* are linearly independent. 


Let A be an m Xn matrix, and let A’ be the result of a sequence of elementary row 
operations on A. Prove that the rows of A span the same space as the rows of A’. 


Let V = F” be the space of column vectors. Prove that every subspace W of V is the 
space of solutions of some system of homogeneous linear equations AX =0. 
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4.6. Find a basis of the space of solutions in R” of the equation 


Xy + 2x2 +3x3+---+nxXy, = 0. 


a7. ORO «ac, Xm) and (Y;,..., Yn) be bases for R” and R”, respectively. Do the mn 
matrices X; Y; form a basis for the vector space R”” of all m Xn matrices? 


4.8. Prove that a set (v1, ..., Un) of vectors in F’” is a basis if and only if the matrix obtained 
by assembling the coordinate vectors of v; is invertible. 


Section 5 Computing with Bases 


5.1. (a) Prove that the set B = ((1, 2, 0)*, (2, 1, 2)', G, 1, 1)5) is a basis of R°. 
(b) Find the coordinate vector of the vector v = (1, 2, 3)' with respect to this basis. 
(c) Let B’ = ((0, 1, 0)', (1, 0, 1)‘, (2, 1, 0)'). Determine the basechange matrix P from B 
to B’. 
§.2. (a) Determine the basechange matrix in R2, when the old basis is the standard basis 
E = (e, e2) and the new basis is B = (e; + €2, €1 — 2). 
(b) Determine the basechange matrix in R”, when the old basis is the standard basis E 
and the new basis is B = (€n, €n_1,..-, €1)- 


(c) Let B be the basis of R* in which v; = e; and v2 is a vector of unit length making an 
angle of 120° with v;. Determine the basechange matrix that relates E to B. 


5.3. Let B = (v1, ..., Un) be a basis of a vector space V. Prove that one can get from B to any 
other basis B’ by a finite sequence of steps of the following types: 
(i) Replace v; by v; + av;, i+ j, for some a in F, 
(ii) Replace v; by cv; for some c#0, 
(iii) Interchange v; and vj. 


5.4. Let F, be a prime field, and let V = F?. Prove: 


(a) The number of bases of V is equal to the order of the general linear group GL2(F,). 
(b) The order of the general linear group GL2(F,) is p(p + 1)(p — 1), and the order of 
the special linear group SL2(F,) is p(p + 1)(p — 1). 


5.5. How many subspaces of each dimension are therein (a) F3, (b) 2 


Section6 Direct Sums 


6.1. Prove that the space R”” of all n Xn real matrices is the direct sum of the space of 
symmetric matrices (A’ = A) and the space of skew-symmetric matrices (A’ = -A). 


6.2. The trace of a square matrix is the sum of its diagonal entries. Let W, be the space of n Xn 
matrices whose trace is zero. Find a subspace W) so that R’”” = W, ® W). 


6.3. Let W;,..., W, be subspaces of a vector space V, such that V = )> W;. Assume that 
Win W2 =0, (Wi + W2)NW3=0,..., (Wy + W2 +--+ + Wy) We =» Provethat 
V is the direct sum of the subspaces Wj, ..., Wz. 
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Section 7 Infinite-Dimensional Spaces 


7.1. Let E be the set of vectors (e€;, e2,...) in R®, and let w = (i,1,1,...). Deseribe the 


span of the set (w, e1, €2,...). 


7.2. The doubly infinite row vectors (a) = (...,@-1, 49, @},...), with a; real form a vector 


space. Prove that this space is isomorphic to R™. 


*7.3. For every positive integer, we can define the space £? to be the space of sequences such 


that )° |a;|? < oo. Prove that 2? is a proper subspace of £?+!. 


*7.4, Let V be a vector space that is spanned by a countably infinite set. Prove that every 


independent subset of V is finite or countably infinite. 


Miscellaneous Problems 


M.1. Consider the determinant function det: F? + F, where F = F, is the prime field of 


M.2. 


M.3. 


*M.4. 


*M.5. 


M.6 


e 


order p and F” is the space of 2 x 2 matrices. Show that this map is surjective, that all 
nonzero values of the determinant are taken on the same number of times, but that there 
are more matrices with determinant 0 than with determinant 1. 


Let A be a real n Xn matrix. Prove that there is an integer N such that A satisfies a 
nontrivial polynomial relation AN + cy_1AN-!4---+e,A +c9 = 0. 

(polynomial paths) (a) Let x(t) and y(t) be quadratic polynomials with real coefficients. 
Prove that the image of the path (x(t), y(£)) is contained in a conic, 1.e., that there is a real 
quadratic polynomial f(x, y) such that f(x(2), y(0)) is identically zero. 

(b) Let x(t) = 2? — 1 and y(t) = P —¢. Find a nonzero real polynomial f(x, y) such that 
f(x@), yD) is identically zero. Sketch the locus { f(x, y) = 0} and the path (x(f), y(t)) 
in R?. 

(c) Prove that every pair x(t), y(t) of real polynomials satisfies some real polynomial 
relation f(x, y) = 0. 


Let V be a vector space over an infinite field F. Prove that V is not the union of finitely 
many proper subspaces. 


Let a be the real cube root of 2. 


(a) Prove that (1, a, a”) is an independent set over Q, i.e., that there is no relation of the 
form a + ba + ca? = 0 with integers a, b, c. 

Hint: Divide x° — 2 by cx? + bx +a. 

(b) Prove that the real numbers a + ba + ca? with a, b, c in Q form a field. 


(Tabasco sauce: a mathematical diversion) My cousin Phil collects hot sauce. He has about 
a hundred different bottles on the shelf, and many of them, Tabasco for instance, have only 
three ingredients other than water: chilis, vinegar, and salt. What is the smallest number 
of bottles of hot sauce that Phil would need to keep on hand so that he could obtain any 
recipe that uses only these three ingredients by mixing the ones he had? 


CHAPTER 4 


Linear Operators 


That confusions of thought and errors of reasoning 
still darken the beginnings of Algebra, 
is the earnest and just complaint of sober and thoughtful men. 


—Sir William Rowan Hamilton 


4.1 THE DIMENSION FORMULA 


A linear transformation T:V — W from one vector space over a field F to another is a 
map that is compatible with addition and scalar multiplication: 


(4.1.1) T(v, + v2) = T0v1) + Tiv2) ~and T(cvy) = cT(v4), 


for all v; and v2 in V and all c in F. This is analogous to a homomorphism of groups, and 
calling it a homomorphism would be appropriate too. A linear transformation is compatible 
with arbitrary linear combinations: 


(4.1.2) ii > vici) = yaa TONG;. 


Left multiplication by an m Xn matrix A with entries in F’, the map 


(4.1.3) F" +, F”™ that sends X¥~ AX 


is a linear transformation. Indeed, A(X; + X72) = AX; + AX2, and A(cX) = cAX. 

If B = (v4, ..., Un) is a Subset of a vector space V over the field F’, the map F” > V 
that sends X ~» BX is a linear transformation. 

Another example: Let P, be the vector space of real polynomial functions 


(4.1.4) Onl’ + Gnugl? a eeay 


of degree at most n. The derivative s defines a linear transformation from P,, to P,-1. 


There are two important subspaces associated with a linear transformation: its kernel 
and its image: 


ker T 
im T 


kernelofT = {ve V|T(v) =0}, 
imageofT ={we W|w = T7(v) forsome ve V}. 


(4.1.5) 
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The kernel is often called the nullspace of the linear transformation. As one may guess from 


the analogy with group homomorphisms, the kernel is a subspace of V and the image is a 
subspace of W. 


The main result of this section is the next theorem. 
Theorem 4.1.6 Dimension Formula. Let 7: V — W be a linear transformation. Then 
dim(ker 7) + dim(im T) = dim V. 


The nullity and the rank of a linear transformation T are the dimensions of the kernel 
and the image, respectively, and the nullity and rank of a matrix A are defined analogously. 
With this terminology, (4.1.6) becomes 


(4.1.7) nullity + rank = dimension of V. 


Proof of Theorem (4.1.6). We’}l assume that V is finite-dimensional, say of dimension n. Let 
k be the dimension of ker T, and let (uj, ..., ux) be a basis for the kernel. We extend this 
set to a basis of V: 


(4.1.8) (uy, 202, Ky, U1,.--, Vee): 


(see (22 5)). rom’ = 1,...,7 —k, let wp=7(v;). If we prove that C = (wj,.. . . @ty_ez) Is 
a basis for the image, it will follow that the image has dimension n — k, and this will prove 
the theorem. 


We must show that C spans the image and that it is an independent set. Let w be an 
element of the image. Then w = T(v) for some v in V. We write v in terms of the basis: 


v= ayy +++ + agug + bv + +++ + Dp-KUn-k 
and apply 7, noting that T(u;) = 0: 
w = T(v) = byw, +--+ + by-KWn-k.- 


Thus w is in the span of C. 


Next, we show that C is independent. Suppose we have a linear relation 
(4.1.9) CyWy tee + Cy-,Wy-K = O. 
Let v = cv) +--+ + Cn-KUn-k, Where v; are the vectors in (4.1.8). Then 
T(v) = cw, +--+ + Cp-kWn-k = 9, 


so v is in the nullspace. We write v in terms of the basis (u;,..., uz) of the nullspace, say 
VU = au, +---+azuyz. Then 


—AyUy — ++» ~ Agug + CV + +++ + Cn-KUn-K = Ut V=0. 


But the basis (4.1.8) is independent. So -a; = 0,...,-az = 0, and c) = 0,...,Cn-x = 9. 
The relation (4.1.9) was trivial. Therefore C is independent. O 
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When T is left multiplication by a matrix A (4.1.3), the kernel of T, the nullspace of A, 
is the set of solutions of the homogeneous equation AX = 0. The image of T is the column 
space, the space spanned by the columns of A, which is also the set of vectors B in F” such 
that the linear equation AX = B has a solution (3.4.6). 

It is a familiar fact that by adding the solutions of the homogeneous equation AX = 0 to 
a particular solution Xo of the inhomogeneous equation AX = B, one obtains all solutions of 
the inhomogeneous equation. Another way to say this is that the set of solutions of AX = B 
is the additive coset Xo + N of the nullspace N in F”. 

An nXn matrix A whose determinant isn’t zero is invertible, and the system of 
equations AX = B has a unique solution for every B. In this case, the nullspace is {0}, and 
the column space is the whole space F”. On the other hand, if the determinant is zero, the 
nullspace N has positive dimension, and the image, the column space, has dimension less 
than n. Not all equations AX = B have solutions, but those that do have a solution have 
more than one solution, because the set of solutions is a coset of N. 


4.2 THE MATRIX OF A LINEAR TRANSFORMATION 


Every linear transformation from one space of column vectors to another is left multiplication 
by a matrix. 


Lemma 4.2.1 Let T: F” — F” be a linear transformation between spaces of column 


vectors, and let the coordinate vector of T(e;) be Aj = (aij, ..., 4m gle Let A be the mXn 
matrix whose columns are Aj, ..., Ay,. Then 7 acts on vectors in F” as multiplication by A. 
Proof. T(X) =Th; epxj)= dj T(ej)xj = do; Ajxj = AX. | 


For example, let c = cos 0, s = sin@. Counterclockwise rotation p:R* —> R? of the 
plane through the angle @ about the origin is a linear transformation. Its matrix is 


(4.2.2) R= k "| 


S (G 


Let’s verify that multiplication by this matrix rotates the plane. We write a vector X in the 
form r(cosq@, sina)', where r is the length of ¥. Let c’ = cosa@ and s’ = sina. The addition 
formulas for cosine and sine show that 


a ! | ae Ae 
Rx=rlc 7S a — ce ss = cos(@ + a) 
Seas: Sc +cs sin(@ + a) 
So RX is obtained from X by rotating through the angle 6, as claimed. 
One can make a computation analogous to that of Lemma 4.2.1 with any linear 


transformation T:V — W, once bases of the two spaces are chosen. If B = (v1, ..., Up) is 
a basis of V, we use the shorthand notation 7(B) to denote the hypervector 


(4.2.3) TB) = (1), fae 
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If v = BX = v4x4 +--+ UnXn, then 


(4.2.4) T(v) = T(v1)x1 +--+ + Tun) xn = T(B)X. 
Proposition 4.2.5 Let 7: V — W bea linear transformation, and let B = (v1, ..., U,) and 
C = (u}, ..., Wm) be bases of V and W, respectively. Let X be the coordinate vector of an 


arbitrary vector v with respect to the basis B and let Y be the coordinate vector of its image 
T(v). So v = BX and T(v) = CY. There is an m Xn matrix A with the dual properties 


(4.2.6) 7T(B) =CA and AX =Y. 


This matrix A is the matrix of the transformation T with respect to the two bases. Either of 
the properties (4.2.6) characterizes the matrix. 


Proof. We write T(v;) as a linear combination of the basis C, say 


(4.2.7) T(vj) = wyayj ++ +> + Wmam;j; 

and we assemble the coefficients a;; into a column vector Aj = (a1j,...,@m j)', so that 
T(v;) = CA;. Then if A is the matrix whose columns are Aj,..., An, 

(4.2.8) iy Grey... cc) = QO. s., Wy) A = CAS 


as claimed. Next, if v = BX, then 
T(v) = T(B)X = CAX. 


Therefore the coordinate vector of T(v), which we named Y, is equal to AX. O 


The isomorphisms F” — V and F” —> W determined by the two bases (3.5.3) help to 
explain the relationship between T and A. If we use those isomorphisms to identify V and 
W with F” and F”, then T corresponds to multiplication by A, as shown in the diagram 
below: 


A 


(4.2.9) Fr —> Fm X nnn > AX 
°| le 
yagi yy BX ~~~> 7(B)X = CAX 


Going from F” to W along the two paths gives the same answer. A diagram that has this 
property is said to be commutative. All diagrams in this book are commutative. 

Thus any linear transformation between finite-dimensional vector spaces V and W 
corresponds to matrix multiplication, once bases for the two spaces are chosen. This is a nice 
result, but if we change bases we can do much better. 
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Theorem 4.2.10 


(a) Vector space form: Let T: V — W bealinear transformation between finite-dimensional 
vector spaces. There are bases B and C of V and W, respectively, such that the matrix 
of T with respect to these bases has the form 


I = ’ 


where /, is the rXr identity matrix and r is the rank of 7. 


(b) Matrix form: Given an m Xn matrix A, there are invertible matrices Q and P such that 
A’ = Q’'AP has the form shown above. 


(4.2.11) 


Proof. (a) Let (u,,..., ux) be a basis for the kernel of T. We extend this set to a basis B 
of V, listing the additional vectors first, say (v1, ..., Up; U41,..., Ug), wherer +k =n. Let 
w; = T(v;). Then, as in the proof of (4.1.6), one sees that (w;,..., wy) is a basis for the 
image of T. We extend this set to a basis C of W, say (wj,..., Wr; Z1,.-., Zs), listing the 
additional vectors last. The matrix of T with respect to these bases has the form (4.2.11). 
Part (b) of the theorem can be proved using row and column operations. The proof is 
Exercise 2.4. OC 


This theorem is a prototype for a number of results that are to come. It shows the 
advantage of working in vector spaces without fixed bases (or coordinates), because the 
structure of an arbitrary linear transformation is described by the very simple matrix (4.2.11). 
But why are (a) and (b) considered two versions of the same theorem? To answer this, we 
need to analyze the way that the matrix of a linear transformation changes when we make 
other choices of bases. ; 

Let A be the matrix of T with respect to bases B and C of V and W, as in (4.2.6), and 
let B =(v},...,v,)and C’ = (Ww, +--+, Wi,) be new bases for V and W. We can relate the 
new basis B’ to the old basis B by an invertible n Xn matrix P, as in (3.5.11). Similarly, C’ is 
related to C by an invertible mm Xm matrix Q. These matrices have the properties 


(4.2.12) B’=BP, PX'’=X and C’=CO, OY'’=Y. 


Proposition 4.2.13 Let A be the matrix of a linear transformation 7 with respect to given 
bases B and C. 


(a) Suppose that new bases B’ and C’ are related to the given bases by the matrices P and 
Q, as above. The matrix of T with respect to the new bases is A’ = O°! AP. 

(b) The matrices A’ that represent T with respect to other bases are those of the form 
A' = Q"'AP, where Q and P can be any invertible matrices of the appropriate sizes. 


Proof. (a) We substitute X = PX’ and Y = QY’ into the equation Y = AX (4.2.6), obtaining 
QY' = APX’. So Y' = (Q°'AP)X’. Since A’ is the matrix such that A’X’ = Y’, this shows 
that A’ = Q”!AP. Part (b) follows because the basechange matrices can be any invertible 
matrices (3.5.9). - O 
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It follows from the proposition that the two parts of the theorem amount to the same 
thing. To derive (a) from (b), we suppose given the linear transformation T, and we begin 
with arbitrary choices of bases for V and W, obtaining a matrix A. Part (b) tells us that there 
are invertible matrices P and Q such that A’ = Q-! AP has the form (4.2.11). When we use 
these matrices to change bases in V and W, the matrix A is changed to A’. 

To derive (b) from (a), we view an arbitrary matrix A as the matrix of the linear 
transformation “left multiplication by A” on column vectors. Then A is the matrix of T with 
respect to the standard bases of F” and F’”, and (a) guarantees the existence of P, Q so that 
Q' AP has the form (4.2.11). 

We also learn something remarkable about matrix multiplication here, because left 
multiplication by a matrix is a linear transformation. Left multiplication by an arbitrary 
matrix A is the same as left multiplication by a matrix of the form (4.2.11), but with reference 
to different coordinates. 

In the future, we will often state a result in two equivalent ways, a vector space form 
and a matrix form, without stopping to show that the two forms are equivalent. Then we will 
present whichever proof seems simpler to write down. 


We can use Theorem 4.2.10 to derive another interesting property of matrix mul- 
tiplication. Let N and U denote the nullspace and column space of the transformation 
A: F”" > F"™.So N isa subspace of F” and U is a subspace of F”. Let k and r denote the 
dimensions of N and U. So k is the nullity of A and r is its rank. 

Left multiplication by the transpose matrix A! defines a transformation A': F” + F” 
in the opposite direction, and therefore two more subspaces, the nullspace N, and the 
column space U, of A'. Here Uj is a subspace of F”, and N, is a subspace of F”. Let 
k, and r; denote the dimensions of N, and Uj, respectively. Theorem 4.1.6 tells us that 
k+rz=n, and also that kj +r; =m. Theorem 4.2.14 below gives one more relation among 
these integers: 


Theorem 4.2.14 With the above notation, r; = r: The rank of a matrix is equal to the rank 
of its transpose. 


Proof. Let P and Q be invertible matrices such that A’ = Q”'AP has the form (4.2.11). 
We begin by noting that the assertion is obvious for the matrix A’. Next, we examine the 
diagrams 


(4.2.15) Fr —> F™ oo —— fe 
P| lo r| |o 
Fr? Al Fm F” a Fm 


The vertical arrows are bijective maps. Therefore, in the left-hand diagram, Q carries the 
column space of A’ (the image of multiplication by A’) bijectively to the column space of A. 
The dimensions of these two column spaces, the ranks of A and A’, are equal. Similarly, the 
ranks of A‘ and A” are equal. So to prove the theorem, we may replace the matrix A by A’. 
This reduces the proof to the trivial case of the matrix (4.2.11). O 
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We can reinterpret the rank r; of the transpose matrix A‘. By definition, it is the 
dimension of the space spanned by the columns of A‘, and this can equally well be thought 
of as the dimension of the space of row vectors spanned by the rows of A. Because of this, 
people often refer to r; as the row rank of A, and to r as the column rank. 

The row rank is the maximal number of independent rows of the matrix, and the 
column rank is the maximal number of independent columns. Theorem 4.2.14 can be stated 
this way: 


Corollary 4.2.16 The row rank and the column rank of an m Xn matrix A are equal. O 


4.3. LINEAR OPERATORS 


In this section, we study linear transformations T: V — V that map a vector space to itself. 
They are called linear operators. Left multiplication by a (square) n Xn matrix with entries 
in a field F defines a linear operator on the space F” of column vectors. 

For example, let c = cos@ and s = sin. The rotation matrix (4.2.2) 


c™=s 
[ee 
is a linear operator on the plane R?. 
The dimension formula dim(ker 7) + dim(Gm 7) = dim V is valid for linear operators. 
But here, since the domain and range are equal, we have extra information that can be 
combined with the formula. Both the kernel and the image of T are subspaces of V. 


Proposition 4.3.1 Let K and W denote the kernel and image, respectively, of a linear 
operator 7 on a finite-dimensional vector space V. 


(a) The following conditions are equivalent: 
e T is bijective, 
= {0}, 
o We=./V, 
(b) The following conditions are equivalent: 
e V is the direct sum K ® W, 
« KN W = {0}, 
e K+We=YV. 


Proof. (a) T is bijective if and only if the kernel K is zero and the image W is the whole 
space V. If the kernel is zero, the dimension formula tells us that dim W = dim V, and 
therefore W = V. Similarly, if W = V, the dimension formula shows that dim K = 0, and 
therefore K = 0. In both cases, T is bijective. 


(b) V is the direct sum K © W if and only if both of the conditions KN W = {0} and 
K+ W = V hold. If K 1 W = {0}, then K and W are independent, so the sum U = K + W 
is the direct sum K ® W, and dim U = dim K + dim W (3.6.6)(a). The dimension formula 
shows that dim U = dim V, so U = V, and this shows that K BW =V.If Ki+W= Vv, 
the dimension formula and Proposition 3.6.6(a) show that K and W are independent, and 
again, V is the direct sum. ? O 
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e A linear operator that satisfies the conditions (4.3.1)(a) is called an invertible operator. 
Its inverse function is also a linear operator. An operator that is not invertible is a singular 
operator. 


= The conditions of Proposition 4.3.1(a) are not equivalent when the dimension of V 
is infinite. For example, let V = R® be the space of infinite row vectors (a1, a, ...) (see 
Section 3.7). The kernel of the right shift operator S*+, defined by 


(4.3.2) S* (a1, a2,...) = (0, ay, a, aoe 


is the zero space, and its image is a proper subspace of V. The kernel of the left shift operator 
S~, defined by 

Oe (ai7a7- 3... ) =e, az, ......), 
is a proper subspace of V, and its image is the whole space. 


The discussion of bases in the previous section must be changed slightly when we are 
dealing with linear operators. We should pick only one basis B for V, and use it in place of 
both of the bases B and C in (4.2.6). In other words, to define the matrix A of T with respect 
to the basis B, we should write 


(4.3.3) T(B) =BA, and AX=Y as before. 


As with any linear transformation (4.2.7), the columns of A are the coordinate vectors of the 
images 7(v;) of the basis vectors: 


(4.3.4) T(vj) = va j + +++ + UnGn;- 


A linear operator is invertible if and only if its matrix with respect to an arbitrary basis is an 
invertible matrix. 

When one speaks of the the matrix of a linear operator on the space F”, it is assumed 
that the basis is the standard basis E, unless a different basis is specified. The operator is then 
multiplication by that matrix. 


A new feature arises when we study the effect of a change of basis. Suppose that B is 
replaced by a new basis B’. 


Proposition 4.3.5 Let A be the matrix of a linear operator T with respect to a basis B. 


(a) Suppose that a new basis B’ is described by B’ = BP. The matrix that represents T with 
respect to this basis is A’ = P-'AP. 

(b) The matrices A’ that represent the operator 7 for different bases are the matrices of the 
form A’ = PAP, where P can be any invertible matrix. oO 


In other words, the matrix changes by conjugation. This is a confusing fact to grasp. 
So, though it follows from (4.2.13), we will rederive it. Since B’ = BP and since 7(B) = BA, 
we have 
T(B’) = T(B)P = BAP. 
We are not done. The formula we have obtained expresses 7(B’) in terms of the old basis B. 
To obtain the new matrix, we must write 7(B’) in terms of the new basis B’. So we substitute 
B = B/P™ into the equation. Doing so gives us T(B’) = B/P"!AP. O 
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In general, we say that a square matrix A is similar to another matrix A’ if A’ = PAP 
for some invertible matrix P. Such a matrix A’ is obtained from A by conjugating by po 
and since P can be any invertible matrix, P~! is also arbitrary. It would be correct to use the 
term conjugate in place of similar. 

Now if we are given the matrix A, it is natural to look for a similar matrix A’ that 
is particularly simple. One would like to get a result somewhat like Theorem 4.2.10. But 
here our allowable change is much more restricted, because we have only one basis, and 
therefore one matrix P, to work with. Having domain and range of a linear transformation 
equal, which seems at first to be a simplification, actually makes things more difficult. 

We can get some insight into the problem by writing the hypothetical basechange 
matrix as a product of elementary matrices, say P = E, --- Ey. Then 


PlAP=E;!...E,'AE,---E,. 


In terms of elementary operations, we are allowed to change A by a sequence of steps 
A~»E™'!AE. In other words, we may perform an arbitrary column operation E on A, 
but we must also make the row operation that corresponds to the inverse matrix E™!. 
Unfortunately, these row and column operations interact, and analyzing them becomes 
confusing. 


4.4 EIGENVECTORS 


The main tools for analyzing a linear operator T': V — V are invariant subspaces and 
eigenvectors. 


¢ A subspace W of V is invariant, or more precisely T-invariant, if it is carried to itself by 
the operator: 


(4.4.1) TW C W. 


In other words, W is invariant if, whenever w is in W, T(w) is also in W. When this is so, T 
defines a linear operator on W, called its restriction to W. We often denote this restriction 
by T|y. 

If W is a T-invariant subspace, we may form a basis B of V by appending vectors to a 
basis (w1,..., wz) of W, say 


(4.4.2) B= (Ww), ..., We 0, - <a Oe 


Then the fact that W is invariant is reflected in the matrix of J. The columns of this matrix, 
we'll call it M, are the coordinate vectors of the image vectors (see (4.3.3)). But T(w;) is 


in the subspace W, so it is a linear combination of the basis (w1,..., wy). When we write 
T(wj) in terms of the basis B, the coefficients of the vectors v1, ..., U,-, will be zero. It 
follows that M will have the block form 

4.43 Oe ime 


where A is ak Xk matrix, the matrix of the restriction of T to W. 
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If V happens to be the direct sum W, © W> of two T-invariant subspaces, and if we 


make a basis B = (B,, B2) of V by appending bases of W; and W9, the matrix of T will have 
the block diagonal form 


(4.4.4) M= ie | 


where A; is the matrix of the restriction of T to W;. 
The concept of an eigenvector is closely related to that of an invariant subspace. 


e An eigenvector v of a linear operator T is a nonzero vector such that 
(4.4.5) ° Tuy NO 


for some scalar A, i.e., some element of F. A nonzero column vector is an eigenvector of a 
square matrix A if it is an eigenvector for the operation of left multiplication by A. 


The scalar A that appears in (4.4.5) is called the eigenvalue associated to the eigenvector 
v. When we speak of an eigenvalue of a linear operator T or of a matrix A without specifying 
an eigenvector, we mean a scalar A that is the eigenvalue associated to some eigenvector. 
An eigenvalue may be any element of F, including zero, but an eigenvector is not allowed 
to be zero. Eigenvalues are often denoted, as here, by the Greek letter A (lambda).! 

An eigenvector with eigenvalue 1 is a fixed vector: T(v) = v. An eigenvector with 
eigenvalue zero is in the nullspace: 7(v) = 0. When V = R”, a nonzero vector v is an 
eigenvector if v and 7(v) are parallel. 

If v is an eigenvector of a linear operator T, with eigenvalue A, the subspace W 
spanned by v will be T-invariant, because T(cv) = cAv is in W for all scalars c. Conversely, 
if the one-dimensional subspace spanned by v is invariant, then v is an eigenvector. So an 
eigenvector can be described as a basis of a one-dimensional invariant subspace. 

It is easy to tell whether or not a given vector X is an eigenvector of a matrix A. We 
simply check whether or not AX is a multiple of X. And, if A is the matrix of T with respect 
to a basis B, and if X is the coordinate vector of a vector v, then X is an eigenvector of A if 
and only if v is an eigenvector for T. 

The standard basis vector e; = (1, 0)' is an eigenvector, with eigenvalue 3, of the 


matrix 
3 1 
O° 24° 


The vector (1,-1)' is another eigenvector, with eigenvalue 2. The vector (Odel)! iswam 
eigenvector, with eigenvalue 2, of the matrix 


eae =) 
Aa2 7 1 
=e dee 2 


1 The German word “eigen” means roughly ‘“‘characteristic.” Eigenvectors and eigenvalues are sometimes called 
characteristic vectors and characteristic values. 
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If (vj,..., Un) is a basis of V and if v; is an eigenvector of a linear operator 7, the 
matrix of T will have the block form 


(4.4.6) i | = 


where A is the eigenvalue of v;. This is the block form (4.4.3) in the case of an invariant 
subspace of dimension 1. 


Proposition 4.4.7 Similar matrices (A’ = P"!AP) have the same eigenvalues. 


This is true because similar matrices represent the same linear transformation. O 


Proposition 4.4.8 


(a) Let T be a linear operator on a vector space V. The matrix of T with respect to a basis 
B = (vj, ..-., Un) is diagonal if and only if each of the basis vectors v; is an eigenvector. 

(b) Ann Xn matrix A is similar to a diagonal matrix if and only if there is a basis of F” that 
consists of eigenvectors. 


This follows from the definition of the matrix A (see (4.3.4)). If T(vj) = A;v;, then 


Ay 
(4.4.9) TOBY =P: unde =i 
i, C) 


This proposition shows that we can represent a linear operator simply by a diagonal 
matrix, provided that it has enough eigenvectors. We will see in Section 4.5 that every linear 
operator on a complex vector space has at least one eigenvector, and in Section 4.6 that 
in most cases there is a basis of eigenvectors. But a linear operator on a real vector space 
needn’t have any eigenvector. For example, a rotation of the plane through an angle 6 
doesn’t carry any vector to a parallel one unless @ is 0 or z. The rotation matrix (4.2.2) with 
6+ 0, 7 has no real eigenvector. 


e A general example of a real matrix that has at least one real eigenvalue is one all of whose 
entries are positive. Such matrices, called positive matrices, occur often in applications, 
and one of their most important properties is that they always have an eigenvector whose 
coordinates are positive (a positive eigenvector). 


Instead of proving this fact, we’ll illustrate it by examining the effect of multiplication 
by a positive 2X2 matrix A on R?. Let w; = Ae; be the columns of A. The parallelogram 
law for vector addition shows that A sends the first quadrant S to the sector bounded by the 
vectors w; and w2. The coordinate vector of w, is the ith column of A. Since the entries of 
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A are positive, the vectors w lie in the first quadrant. So A carries the first quadrant to itself: 
S > AS. Applying A to this inclusion, we find AS > A?S, and so on: 


(4.4.10) A AsieeAS DIAS > ...., 


er 
1 4) 

Now, the intersection of a nested set of sectors is either a sector or a half-line. In our 
case, the intersection Z = (]A’S turns out to be a half-line. This is intuitively plausible, 
and it can be shown in various ways, but we’ll omit the proof. We multiply the relation 
Z =()A’S on both sides by A: 


as is illustrated below for the matrix A = 


[o-@) (o@) 
AZ=A A's = (A's = Z. 
0 1 


Hence Z = AZ. Therefore the nonzero vectors in Z are eigenvectors. 


BAZ 


(4.4.11) Images of the First Quadrant Under Repeated Multiplication by 
a Positive Matrix. 


4.5 THE CHARACTERISTIC POLYNOMIAL 


In this section we determine the eigenvectors of an arbitrary linear operator. We recall that 
an eigenvector of a linear operator T is a nonzero vector v such that 


(4.5.1) ~ TO) = Ap: 


for some A in F. If we don’t know A, it can be difficult to find the eigenvector directly when 
the matrix of the operator is complicated. The trick is to solve a different problem, namely 
to determine the eigenvalues first. Once an eigenvalue A is determined, equation (4.5.1) 
becomes linear in the coordinates of v, and solving it presents no problem. 

We begin by writing (4.5.1) in the form 


— (4.5.2) Pie Ti) = 0. 
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where / stands for the identity operator and A/ — T is the linear operator defined by 
(4.5.3) [AI — T](v) =Av—- Tv). 
It is easy to check that AJ — T is indeed a linear operator. We can restate (4.5.2) as follows: 


A nonzero vector v is an eigenvector with eigenvalue A 


(4.5.4) if and only if it is in the kernel of AJ — T. 


Corollary 4.5:5 Let T be a linear operator on a finite-dimensional vector space V. 

(a) The eigenvalues of 7 are the scalars X in F such that the operator AJ — T is singular, 
i.e., its nullspace is not zero. 

(b) The following conditions are equivalent: 


e T isa singular operator. 
e T has an eigenvalue equal to zero. 
e If A is the matrix of T with respect to an arbitrary basis, then det A = 0. O 


If A is the matrix of 7 with respect to some basis, then the matrix of AJ — TisAJ — A. 
So Al — T is singular if and only if det (AJ — A) = 0. This determinant can be computed 
with indeterminate A, and doing so provides us, at least in principle, with a method for 
determining the eigenvalues and eigenvectors. 


Suppose for example that A is the matrix a A whose action on R? is illustrated in 
Figure (4.4.11). Then 


A-3 -2 
va ae 
and 


devi — A) xr" 7 i an 


The determinant vanishes when A = 5S or 2, so the eigenvalues of A are 5 and 2. To find the 
eigenvectors, we solve the two systems of equations [5/ — A]X = 0 and [2/ — A]X = 0. The 
solutions are determined up to scalar factor: 


(4.5.6) v= ral v= lar 


We now consider the same computation for an indeterminate matrix of arbitrary size. 
It is customary to replace the symbol A by a variable t. We form the matrix t] — A: 


(t-ay;) -ay2 --- -Aty, 
-a t-a vos =Q 
(4.5.7) je) (422) a 
-An} meres oe (-Gaa) 


The complete expansion of the determinant [Chapter 1 (1.6.4)] shows that det (17 — A)isa 
polynomial of degree 1 in t whose coefficients are scalars, elements of F. 
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Definition 4.5.8 The characteristic polynomial of a linear operator T is the polynomial 
p(t) = det (tJ — A), 
where A is the matrix of T with respect to some basis. 
The eigenvalues of T are determined by combining (4.5.5) and (4.5.8): 


Corollary 4.5.9 The eigenvalues of a linear operator are the roots of its characteristic 
polynomial. O 


Corollary 4.5.10 Let A be an upper or lower triangular n Xn matrix with diagonal entries 
@i1,--., Ann. The characteristic polynomial of A is (¢ — a11)---(t — Gyn). The diagonal 
entries of A are its eigenvalues. 


Proof. If A is upper triangular, so is tJ — A, and the diagonal entries of tJ — A are t — ajj. 
The determinant of a triangular matrix is the product of its diagonal entries. O 


Proposition 4.5.11 The characteristic polynomial of an operator T does not depend on the 
choice of a basis. 


Proof. A second basis leads to a matrix A’ = P"!AP (4.3.5), and 
tI~A'=tl-P'AP=P\(tl—A)P. Then 
det (t! — A’) = det P"'det (tI — A)det P = det (tI — A). o 


The characteristic polynomial of the 22 matrix A = E ‘| is 


-b 


(4.5.12) p(t) = det (tJ — A) = det [ t-d 


= t* — (trace A)t,+ (det A), 


where trace A=a+d. 

An incomplete description of the characteristic polynomial of an n Xn matrix is 
given by the next proposition, which is proved by computation. It wouldn’t be very 
difficult to determine the remaining coefficients, but explicit formulas for them aren’t often 
used. 


Proposition 4.5.13 The characteristic polynomial of ann xn matrix A has the form 
p@® = t” — (trace A)t” | + (intermediate terms) + (-1)” (det A), 
where trace A, the trace of A, is the sum of its diagonal entries: 
trace A = a3; +€22 +-::+ Gnn.- Oo 


Proposition 4.5.11 shows that all coefficients of the characteristic polynomial are 
independent of the basis. For instance, trace (PAP) = trace A. 
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Since the characteristic polynomial, the trace, and the determinant are independent of 
the basis, they depend only on the operator T. So we may define the terms characteristic 
polynomial, trace, and determinant of a linear operator T. They are the ones obtained using 
the matrix of T with respect to any basis. 


Proposition 4.5.14 Let T be a linear operator on a finite-dimensional vector space V. 


(a) If V has dimension n, then T has at most n eigenvalues. 
(b) If F is the field of complex numbers and V #{0}, then T has at least one eigenvalue, and 
hence at least one eigenvector. 


Proof. (a) The eigenvalues are the roots of the characteristic polynomial, which has degree 
n. A polynomial of degree n can have at most n roots. This is true for a polynomial with 
coefficients in any field F (see (12.2.20)). 


(b) The Fundamental Theorem of Algebra asserts that every polynomial of positive degree 
with complex coefficients has at least one complex root. There is a proof of this theorem in 
Chapter 15. Ci5a0-1). CO 


For example, let Rg be the matrix (4.2.2) that represents the counterclockwise rotation 
of R? through an angle 6. Its characteristic polynomial, p(t) = t* — (2cos@)t + 1, has no 
real root provided that 640, z, so no real eigenvalue. We have observed this before. But 
the operator on C* defined by Rg does have the complex eigenvalues e*'?. 


Note: When we speak of the roots of a polynomial p(t) or the eigenvalues of a matrix or 
linear operator, repetitions corresponding to multiple roots are supposed to be included. 
This terminology is convenient, though imprecise. O 


Corollary 4.5.15 Ifi),..., A, are the eigenvalues of ann Xn complex matrix A, then det A 
is the product A; ---A,, and trace A is the sum Aj, +... +Ay. 


Proof. Let p(t) be the characteristic polynomial of A. Then 
(t 204) ---@=An) = PO) SP race A) | + eGewae Oo 


4.6 TRIANGULAR AND DIAGONAL FORMS 


In this section, we show that for “‘most”’ linear operators on a complex vector space there is 
a basis such that the matrix of the operator is diagonal. The key fact, which was noted at the 
end of Section 4.5, is that every complex polynomial of positive degree has a root. This tells 
us that every linear operator has at least one eigenvector. 


Proposition 4.6.1 


(a) Vector space form: Let T be a linear operator on a finite-dimensional complex vector 
space V. There is a basis B of V such that the matrix of T with respect to that basis is 
upper triangular. . 


(b) Matrix form: Every complex nXn matrix A is similar to an upper triangular matrix: 
There is a matrix P € GL,(C) such that P"! AP is upper triangular. 


Section 4.6 = Triangular and Diagonal Forms 117 


Proof. The two assertions are equivalent, because of (4.3.5). We will work with the matrix. 
lust Ve = C”. Proposition 4.5.14(b) shows that V contains an eigenvector of A, call it vj. Let 
A be its eigenvalue. We extend (v) to a basis B = (v},..., Un) for V. The new matrix 
A’ = P“!AP has the block form 


A| x 
4.6.2 ‘= 
( ) A SB 


where D is an (n — 1) x(n — 1) matrix (see (4.4.6)). By induction on n, we may assume that 
the existence of a matrix Q € GLy-\(C) such that O7'DQ is upper triangular will have been 


proved. Let 
an il 0 Po AS A * 
0 = [ato Then A” =A C= 0lo™DO 


is upper triangular, and A” = (PQ,)"!A(PQ)). oO 


Corollary 4.6.3 Proposition 4.6.1 continues to hold when the phrase “upper triangular” is 
replaced by “lower triangular.”’ 


The lower triangular form is obtained by listing the basis B of (4.6.1)(a) in reverse 
order. 


The important point for the proof of Proposition 4.6.1 is that every complex polynomial 
has a root. The same proof will work for any field F’,, provided that all the roots of the 
characteristic polynomial are in the field. 


Corollary 4.6.4 


(a) Vector space form: Let T be a linear operator on a finite-dimensional vector space V 
over a field F,, and suppose that the characteristic polynomial of T is a product of linear 
factors in the field F. There is a basis B of V such that the matrix A of T is upper (or 
lower) triangular. 

(b) Matrix form: Let A be ann Xn matrix with entries in F’, whose characteristic polynomial 
is a product of linear factors. There is a matrix P ¢ GL,(F) such that P”'AP is upper 
(or lower) triangular. 


The proof is the same, except that to make the induction step one has to check that the 
characteristic polynomial of the matrix D that appears in (4.6.2) is p(t)/(t — 4), where p(t) 
is the characteristic polynomial of A. Then the hypothesis that the characteristic polynomial 
factors into linear factors carries over from A to D. C 


We now ask which matrices A are similar to diagonal matrices. They are called 
diagonalizable matrices. As we saw in (4.4.8) (b), they are the matrices that have bases 
of eigenvectors. Similarly, a linear operator that has a basis of eigenvectors is called a 
diagonalizable operator. The diagonal entries are determined, except for their order, by the 
linear operator 7. They are the eigenvalues. 


Theorem 4.6.6 below gives a partial answer to our question; a more complete answer 
will be given in the next section. 
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Proposition 4.6.5 Let vj,..., vu, be eigenvectors of a linear operator T with distinct 
eigenvalues Aj, ..., Ar. The set (v4, ..., Uy) is independent. 


Proof. We use induction on r. The assertion is true when r = 1, because an eigenvector 
cannot be zero. Suppose that a dependence relation 


0=ay,Vv} +-+++a4yUr 
is given. We must show that a; = 0 for all i. We apply the operator T: 
0 = T(0) =a, T(u1) +--+ +a,Tvy) = ayAqvy + +++ + yA vy. 


This is a second dependence relation among (v1, ..., Ur). We eliminate v, from the two 
relations, multiplying the first relation by A, and subtracting the second: 


0O= ay (Ar — Aq) U1 + +++ + @p—1 (Ar — Ar—1) Ur-1. 


Applying induction, we may assume that (v1, ..., v,— 1) is an independent set. This tells us 
that the coefficients a;(A, — A;), i < r, are all zero. Since the A; are distinct, A, — A; is not 
zero if i < r. Thus a; = ---=a;_, = 0. The original relation reduces to 0 = a@;v,. Since,an 
eigenvector cannot be zero, a; is zero too. O 


The next theorem follows by combining (4.4.8) and (4.6.5): 


Theorem 4.6.6 Let T be a linear operator on a vector space V of dimension n over a field 
F’. If its characteristic polynomial has n distinct roots in F, there is a basis for V with respect 
to which the matrix of T is diagonal. O 


Note: Diagonalization is a powerful tool. When one is presented with a diagonalizable 
operator, it should be an automatic response to work with a basis of eigenvectors. 
As an example of diagonalization, consider the real matrix 


(4.6.7) A= F il ; 


Its eigenvectors were computed in (4.5.6). These eigenvectors form a basis B = (v}, v2) of 
R*. According to (3.5.13), the matrix relating the standard basis E to this basis B is 


(4.6.8) P=(BI= | at Pel; {| ana 
(4.6.9) PAP JN Alt ial ie 


The next proposition is a variant of Proposition 4.4.8. We omit the proof. 
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Proposition 4.6.10 Let F be a field. 


(a) Let T be a linear operator on F”. If B = (vj, ..., un) is a basis of eigenvectors of 7, and 
if P = [B], then A = P'1AP = [B] !A[B] is diagonal. 
(b) Let B= (11, ..., U,) be a basis of F”, and let A be the diagonal matrix with diagonal 


entries A;,...,A, that are not necessarily distinct. There is a unique matrix A such 

that, fori = 1,...,7, v; is an eigenvector of A with eigenvalue A,;, namely the matrix 
-1 

[B] A [B] ~. C 


A nice way to write the equation [B] 1 A[B] = A is 
(4.6.11) A[B] = [B]A. 
One application of Theorem 4.6.6 is to compute the powers of a diagonalizable matrix. 


The next lemma needs to be pointed out, though it follows trivially when one expands the 
left sides of the equations and cancels PP™!. 


Lemma 4.6.12 Let A, B, and P be n Xn matrices. If P is invertible, then (P-!AP)(P"!BP) = 
P"!(AB)P, and for all k > 1, (PAP) = P'1A‘P. Oo 


Thus if A, P, and A are as in (4.6.9), then 


sxietlgdall? em? ‘lal Sinead llelomee2eleaclili Serpe tts 2-5" soma 
wl peepee ot wel 
AN =3li “ail ,| Fi a3 5k _ 9k 2.5k 4 2k s 


If f(t) =an+ajt+---+apt" is a polynomial in ¢ with coefficients in F and if A is an 


nXn matrix with entries in F,, then f(A) will denote the matrix obtained by substituting A 
formally for f. 


(4.6.13) f(A) =aol +ajA +--+ +a,A”. 
The constant term ao gets replaced by ao/J. Then if A = PAP” bi 
(4.6.14) f(A) = f(PAP") = apl +a,PAP 1! +-+-+anPA"P" = Pf(A)P". 


The analogous notation is used for linear operators: If T is a linear operator on a vector 
space over a field F, the linear operator f(7) on V is defined to be 


(4.6.15) f(T) = ag +a,T +--+: +a,T", 
where J denotes the identity operator. The operator f(7) acts on a vector by f(T)v = 


ajv +a,Tu+-+-+a,T”v. (In order to avoid too many parentheses we have omitted some 
by writing Tv for T(v).) 
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4.7 JORDAN FORM 


Suppose we are given a linear operator 7 on a finite-dimensional complex vector space 
V. We have seen that, if the roots of its characteristic polynomial are distinct, there is 
a basis of eigenvectors, and that the matrix of 7 with respect to that basis is diago- 
nal. Here we ask what can be done without assuming that the eigenvalues are distinct. 
When the characteristic polynomial has multiple roots there will most often not be a 
basis of eigenvectors, but we’ll see that, nevertheless, the matrix can be made fairly 
simple. 

An eigenvector with eigenvalue A of a linear operator T is a nonzero vector v such 
that (T — A)v = 0. (We will write T — A for T — AJ here.) Since our operator T may not 
have enough eigenvectors, we work with generalized eigenvectors. 


¢ A generalized eigenvector with eigenvalue 4 of a linear operator T is a nonzero vector x 
such that (T — A)*x = 0 for some k > 0. Its exponent is the smallest integer d such that 
(T— exe. 


Proposition 4.7.1 Let x be a generalized eigenvector of T, with eigenvalue A and exponent 
d,and for j > 0, let; = (T —A)/x. Let B =s(up, ... ,tg_1), andlet"X = Span’ BY Tieney 
is a 7-invariant subspace, and B is a basis of X. 


We use the next lemma in the proof. : 


Lemma 4.7.2. With uj; as above, a linear combination y = cjuj + --: + Cg_-jUq_1 with 
j <d-1andc;+#0is a generalized eigenvector, with eigenvalue 4 and exponent d ~ j. 


Proof. Since the exponent of x is d,(T —d)¢ x = uUg—, #0. Therefore (T — eI = 
Cj Uq-y isn’t zero, but (T — r)¢-/y = 0. So y is a generalized eigenvector with eigenvalue » 
and exponent d - j, as claimed. O 


Proof of the Proposition. We note that 


Aujtujur ifj<d—-1 
(4.7.3) Tu; = Au; ifj=d-1 
0 if j>d-—-1. 


Therefore Tu; is in the subspace X for all j. This shows that X is invariant. Next, B 
generates X by definition. The lemma shows that every nontrivial linear combination of B is 
a generalized eigenvector, so it is not zero. Therefore B is an independent set. Oj 


Corollary 4.7.4 Let x be a generalized eigenvector for T, with eigenvalue A. Then A is an 
ordinary eigenvalue — a root of the characteristic polynomial of T. 


Proof. If the exponent of x is d, then with notation as above, Ug_ iS an eigenvector with 
eigenvalue A. ; O 
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Formula 4.7.3 determines the matrix that describes the action of T on the basis B of 


Proposition 4.7.1. It is the dd Jordan block J). Jordan blocks are shown below for low 
values of d: 


(4.7.5) 1, =a Ei le 1 A i 


The operation of a Jordan block is especially simple when A = 0. The dXd block Jo 
operates on the standard basis of C2 as 


(4.7.6) eywegw-ss Seg 0. 
The 1 X1 Jordan block Jo is zero. 


The Jordan Decomposition Theorem below asserts that any complex n Xn matrix is 
similar to a matrix J made up of diagonal Jordan blocks (4.7.5) — that it has the Jordan form 


Jj 
(4.7.7) qe 


where J; = Jj; for some A;. The blocks J; can have various sizes d;, with }'d; = n, 
and the diagonal entries A; aren’t necessarily distinct. The characteristic polynomial of the 
matrix J is 


(4.7.8) p(t) = (tA) (t— Ag)® ++ Ag)”. 
The 2X2 and 3 X3 Jordan forms are 


x x MQ At Ai 
(4.7.9) | my, | macecrelh de a es rn hee 
: : d3 do 1 Ay 
where the scalars A; may be equal or not, and in the fourth matrix, the blocks may be listed 
in the other order. 


Theorem 4.7.10 Jordan Decomposition. 

(a) Vector space form: Let T be a linear operator on a finite-dimensional complex vector 
space V. There is a basis B of V such that the matrix of T with respect to B has Jordan 
form (4.7.7). 

(b) Matrix form: Let A be ann Xn complex matrix. There is an invertible complex matrix P 
such that P-!AP has Jordan form. 

It is also true that the Jordan form of an operator T or a matrix A is unique except for the 

order of the blocks. 
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Proof. This proof is due to Filippov [Filippov]. Induction on the dimension of V allows us 
to assume that the theorem is true for the restriction of T to any proper invariant subspace. 
So if V is the direct sum of proper 7-invariant subspaces, say V; ®--- ® V,, with r > 1, then 
the theorem is true for T. 


Suppose that we have generalized eigenvectors v;, fori = 1,..., 7. Let V; be the 
subspace defined as in Proposition 4.7.1, with x = v,. If V is the direct sum V; © --- ® Ve 
the theorem will be true for V, and we say that v;,..., v, are Jordan generators for T. We 


will show that a set of Jordan generators exists. 


Step 1: We choose an eigenvalue A of T, and replace the operator T by T — AJ. If A is the 
matrix of T with respect to a basis, the matrix of T — AJ with respect to the same basis will 
be A — AJ, and if one of the matrices A or A — AJ is in Jordan form, so is the other. So 
replacing T by T — AJ is permissible. Having done this, our operator, which we still call 7, 
will have zero as an eigenvalue. This will simplify the notation. 


Step 2: We assume that 0 is an eigenvalue of T. Let K; and U; denote the kernel and image, 
respectively, of the ith power J’. Then K, C K2C--- and U, > U2 > ---. Because V is finite- 
dimensional, these chains of subspaces become constant for large m, Km = Kmii = °°: 
and Um = Umi, =-:-. Let K = Ky and U = U,. We verify that K and U are invariant 
subspaces, and that V is the direct sum K © U. 

Thesubspaces are invariant beeause TK,,“G K,,-) C K», andefU;,= Opa >On. 
To show that V = K @ U, it suffices to show that K N U = {0} (see Proposition 4.3.1(b)). 
Let z be an element of KOU. Then 7” z = 0, and also z = 7 v for some v in V. Therefore 
Tv =0,s0 vis an element of K2»,. But Kx, = Km,so Tv = 0, i.e., z = 0. 

Since T has an eigenvalue 0, K is not the zero subspace. Therefore U has smaller 
dimension than V, and by our induction assumption, the theorem is true for T|yy. Unfortu- 
nately, we can’t use this reasoning on K, because U might be zero. So we must still prove 
the existence of a Jordan form for T7|x. We replace V by K and T by 7|x. 


¢ A linear operator T on a vector space V is called nilpotent if for some positive integer r, 
the operator T” is zero. 


We have reduced the proof to the case of a nilpotent operator. 


Step 3: We assume that our operator T is nilpotent. Every nonzero vector will be a generalized 
eigenvector with eigenvalue 0. Let N and W denote the kernel and image of T, respectively. 
Since T is nilpotent, N+{0}. Therefore the dimension of W is smaller than that of V, 
and by induction, the theorem is true for the restriction of the operator to W. So there 
are Jordan generators w),..., wy for T|w. Let e; denote the exponent of w;, and let W; 
denote the subspace formed as in Proposition 4.7.1, using the generalized eigenvector w;. 
SoW=W,®©®.-.--® W,. 

For each i, we choose an element v; of V such that Tv; = w;. The exponent d; of v; 
will be equal to e; + 1. Let V; denote the subspace formed as in (4.7.1) using the vector v;. 
Then TV; = Wj. Let U denote the sum V,; +---+ V,. Since each V; ‘is an invariant subspace, 
so is U. We now verify that v,,..., vy are Jordan generators for the restriction T|y, i.e., 
that the subspaces V; are independent. 
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We notice two things: First, TU = W because TV; = W;. Second, V; N N C W;. This 
follows from Lemma 4.7.2, which shows that V; M N is the span of the last basis vector 
T4i-1y,. Since dj — 1 = e;, which is positive, T4—!v, is in the image W;. 

We suppose given a relation 0; +--- +0, = 0, with 0; in V;. We must show that 0; = 0 
for alli. Let Wj = Tv;. Then w,+---+ Ww, = 0, and @; isin W;. Since the subspaces W; are 
independent, w; = 0 for all i. So Tv; = 0, which means that 0; is in V; M N. Therefore 0, is 
in W;. Using the fact that the subspaces W; are independent once more, we conclude that, 
a= Oiormallg: 


Step 4: We show that a set of Jordan generators for T can be obtained by adding some 
elements of N to the set {v,, ..., v~} of Jordan generators for T|y. 

Let v be an arbitrary element of V and let Tv = w. Since TU = W, there is a vector u 
in U such that Tu = w = Tv. Then z = v—wisin N and v = u +z. Therefore U+ N = V. 
This being so, we extend a basis of U to a basis of V by adding elements, say z1,..., Zg, of 
N (see Proposition 3.4.16(a)). Let N’ be the span of (z},..., z¢). Then UN N’ = {0} and 
U+ WN’ = V,so V is the direct sum U © N’. 

The operator T is zero on N’, so N’ is an invariant subspace, and the matrix of T|,y is 
the zero matrix, which has Jordan form. Its Jordan blocks are 1X1 zero matrices. Therefore 
{U1,..-, Ur} Z1,-.. Ze} 1s a set of Jordan generators for T. O 


It isn’t difficult to determine the Jordan form for an operator 7, provided that the 
eigenvalues are known, and the analysis also proves uniqueness of the form. However, 
finding an appropriate basis of V can be painful, and is best avoided. 

To determine the Jordan form, one chooses an eigenvalue 4, and replaces T by T — XJ, 
to reduce to the case that A = 0. Let K; denote the kernel of 7", and let k; be the dimension 
of K;. In the case of a single dd Jordan block with A = 0, these dimensions are: 


Hath. i ifi<d 
k? =\\, = 


The dimensions k; for a general operator T are obtained by adding the numbers Reo for 
each block with 2 = 0. So k, will be the number of blocks with A = 0, k2 — ky will be the 
number of blocks of size d > 2 with A = 0, and so on. 


Two simple examples: 


0 tf 90 1 -1 | 
A=!1 01] and B={ 2 2 2 
0 -1 O -1 1 -l 
Here A® = 0, but A*+0. If v is a vector such that A’v<0, for instance v = e;, then 
(v, Tv, T’v) will be a basis. The Jordan form consists of a single 3X3 block. 
On the other hand, B2 = 0. Taking v = e; again, the set (v, Tv) is independent, and 
this gives us a 2X2 block. To obtain the Jordan form, we have to add a vector in N 7 for 
example v’ = e) + €3, which will give a 11 block (equal to zero). The required basis is 


(Tv, 0’): 


124 ‘Chapter 4 Linear Operators 


It is often useful to write the Jordan form as J = D + N, where D is the diagonal part 
of the matrix, and N is the part below the diagonal. For a single Jordan block, we will have 
D =Aland N = Jo, as is illustrated below for a 3 X3 block: 


19S Xr 0 
vee yl fA = | One +}1 0 = Al+Jo = D+N. 
1 A 0 Az 10) 


Writing J = D + N is convenient because D and N commute. The powers of J can be 
computed by the binomial expansion: 


(4.7.11) J’ =(D+N)" =D" +(/)D™1N + ()D"?N* + - 


When J is ann Xn matrix, N” = 0, and this expansion has at most n terms. In the case of a 
single block, the formula reads 


(4.7.12) J’ = (Al +Jo)’ =A'1+ ({)Ar-1Jo ate Carrer ee 
Corollary 4.7.13 Let 7 be a linear operator on a finite-dimensional complex vector space. 


The following conditions are equivalent: 


(a) TJ is a diagonalizable operator, 
(b) every generalized eigenvector is an eigenvector, 
(c) all of the blocks in the Jordan form for T are 1 <1 blocks. 


The analogous statements are true for a square complex matrix A. 


Proof. (a) > (b): Suppose that T is diagonalizable, say that the matrix of T with respect to 


the basis B = (vj, ..., Uy) is the diagonal matrix A with diagonal entries A;,..., A,. Let 
v be a generalized eigenvector in V, say that (T — A)*v = 0 for some A and some k > 0. 
We replace T by T — A to reduce to the case that T*v = 0. Let ¥ = (x1,..., Xn)! be the 


coordinate vector of v. The coordinates of T*v will be eae Since T*v = 0, either A; = 0, 
or x; = 0, and in either case, Ex; = 0eTherefore! 70: 


(b) = (c): We prove the contrapositive. If the Jordan form of T has ak Xk Jordan block with 
k > 1, then looking back at the action (4.7.6) of J, — AJ, we see that there is a generalized 
eigenvector that is not an eigenvector. So if (¢c) is false, (b) is false too. Finally, it is clear that 
(c) => (a). O 


Here is a nice application of Jordan form. 


Theorem 4.7.14 Let T be a linear operator on a finite-dimensional complex vector space V. 
If some positive power of T is the identity, say 7” = J, then T is diagonalizable. 


Proof. It suffices to show that every generalized eigenvector is an eigenvector. To do this, 
we assume that (J — AJ)*v = 0 with v0, and we show that (T — A)v = 0. Since A is an 
eigenvalue and since 7” = I, A” = 1. We divide the polynomial ft” — 1 by t— A: 


P-L (+A +. 4 DG-A). 
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We substitute T for ¢ and apply the operators to v. Let w = (T — Av. SincesT” — daa; 


0=(7F —-Dv=(T"'! 4+ iT Hee $f Al27 4 4°-1)(T —A)v 
= ae ats ATT-2 mag ee Paes fs ) Ww 
= rly. 
(For the last equality, one uses the fact that Tw = Aw.) Since rA”!w =0, w =0. O 


We go back for a moment to the results of this section. Where has the hypothesis that 
V be a vector space over the complex numbers been used? The answer is that its only use is 
to ensure that the characteristic polynomial has enough roots. 


Corollary 4.7.15 Let V be a finite-dimensional vector space over a field F, and let T be a 
linear operator on V whose characteristic polynomial factors into linear factors in F. The 
Jordan Decomposition theorem 4.7.10 is true for T. O 


The proof is identical to the one given for the case that F = C. 


Corollary 4.7.16 Let T be a linear operator on a finite-dimensional vector space over a field 
of characteristic zero. Assume that 7” = J for some r > 1 and that the polynomial t” — 1 
factors into linear factors in F’. Then T is diagonalizable. O 


The characteristic zero hypothesis is needed to carry through the last step of the proof 
of Theorem 4.7.14, where from the relation rA”’~!w = 0 we want to conclude that w = 0. 
The theorem is false in characteristic different from zero. 


—Yvonne Verdier2 


EXERCISES 


Section 1 The Dimension Formula 
1.1. Let A be a £Xm matrix and let B be an nX p matrix. Prove that the rule M~- AMB 
defines a linear transformation from the space F”*” of m Xn matrices to the space F**?. 
1.2. Let vj}, ..., Un be elements of a vector space V. Prove that the map y: F” — V defined 
by p(X) = v1x1 +++: + UpXy is a linear transformation. 
1.3. Let A be an m Xn matrix. Use the dimension formula to prove that the space of solutions 
of the linear system AX = 0 has dimension at least n — m. 


1.4. Prove that every m Xn matrix A of rank 1 has the form A = X Y', where X, Y are m- and 
n-dimensional column vectors. How uniquely determined are these vectors? 


2T’ve received many emails asking about this rebus. Yvonne, an anthropologist, and her husband Jean-Louis, a 
mathematician, were close friends who died tragically in 1989. In their memory, I included them among the people 
quoted. The history of the valentine was one of Yvonne’s many interests, and she sent this rebus as a valentine. 
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1.5. (a) Let U and W be vector spaces over a field F. Show that the two operations 
(u, w) + (u’, w’) = (ut+u’,w+w’) and c(u, w) = (cu, cw) on pairs of vectors 
make the product set U x W into a vector space. It is called the product space. 

(b) Let U and W be subspaces of a vector space V. Show that the map T7:UXW > V 
defined by J(u, w) = u + w isa linear transformation. 
(c) Express the dimension formula for T in terms of the dimensions of subspaces of V. 


Section 2 The Matrix of a Linear Transformation 


2.1. Let A and B be 2X2 matrices. Determine the matrix of the operator 7: M ~» AMB on the 
space F?*? of 2X2 matrices, with respect to the basis (€11, €12, €21, €22) of Pex? 


2.2. Let A be ann Xn matrix, and let V denote the space of n-dimensional row vectors. What 
is the matrix of the linear operator “right multiplication by A” with respect to the standard 
basis of V? 


2.3. Find all real 22 matrices that carry the line y = x to the line y = 3x. 
2.4. Prove Theorem 4.2.10(b) using row and column operations. 


2.5. 3Let A be an mXn matrix of rank r, let J be a set of r row indices such that the 
corresponding rows of A are independent, and let J be a set of ry column indices 
such that the corresponding columns of A are independent. Let M denote the rxr 
submatrix of A obtained by taking rows from / and columns from J. Prove that M is 
invertible. : 


Section3 Linear Operators 


3.1. Determine the dimensions of the kernel and the image of the linear operator T on the 
space R” defined by 7(%1, ..., Xn)’ = (41 +. Xn, X2 + Xn-1,.--,X%n +1)". 


3:2. (a) Wet A®= Me a be a real matrix, with c not zero. Show that using conjugation by 


elementary matrices, one can eliminate the ‘‘a”’ entry. 
(b) Which matrices with c = 0 are similar to a matrix in which the “‘a” entry is zero? 


3.3. Let T: V > V bea linear operator on a vector space of dimension 2. Assume that T is not 
multiplication by a scalar. Prove that there is a vector v in V such that (v, T(v)) is a basis 
of V, and describe the matrix of 7 with respect to that basis. 


3.4. Let B be a complex n Xn matrix. Prove or disprove: The linear operator T on the space of 
all 2 Xn matrices defined by T(A) = AB — BA is singular. 


Section 4 Eigenvectors 


4.1. Let T be a linear operator on a vector space V, and let A be a scalar. The eigenspace V 
is the set of eigenvectors of T with eigenvalue A, together with 0. Prove that V™ is a 
T-invariant subspace. 


4.2. (a) Let T be a linear operator on a finite-dimensional vector space V, such that T? is the 
identity operator. Prove that for any vector v in V, v — Tv is either an eigenvector with 
eigenvalue -1, or the zero vector. With notation as in Exercise 4.1, prove that V is the 
direct sum of the eigenspaces V and VO), 


Suggested by Robert DeMarco. 


4.3. 


4.4, 


4.5. 


4.6 


4.7. 


4.8. 
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(b) Generalize this method to prove that a linear operator T such that 74 = J decomposes 
a complex vector space into a sum of four eigenspaces. 


Let T be a linear operator on a vector space V. Prove that if W, and W> are T-invariant 
subspaces of V, then W; + W2 and W;.M W>? are T-invariant. 


A 2X2 matrix A has an eigenvector v; = (1, 1)’ with eigenvalue 2 and also an eigenvector 
v2 = (1, 2)! with eigenvalue 3. Determine A. 


Find all invariant subspaces of the real linear operator whose matrix is 


1 
@|? ‘ik by] 2 
) 


Let P be the real vector space of polynomials p(x) = a9 +a, + ---+anx" of degree at 
most n, and let D denote the derivative 4, considered as a linear operator on P. 


(a) Prove that D is a nilpotent operator, meaning that D* = 0 for sufficiently large k. 
(b) Find the matrix of D with respect to a convenient basis. 
(c) Determine all D-invariant subspaces of P. 

a b 


Let A = [¢ | be a real 2X2 matrix. The condition that a column vector X be an 


eigenvector for left multiplication by A is that AX = Y be a scalar multiple of X, which 
means that the slopes s = x2/x; and s’ = y2/y, are equal. 


(a) Find the equation in s that expresses this equality. 


(b) Suppose that the entries of A are positive real numbers. Prove that there is an 
eigenvector in the first quadrant and also one in the second quadrant. 


Let T be a linear operator on a finite-dimensional vector space for which every nonzero 
vector is an eigenvector. Prove that T is multiplication by a scalar. 


Section 5 The Characteristic Polynomial 


5.1. 


D2. 


5.3. 


Compute the characteristic polynomials and the complex eigenvalues and eigenvec- 
tors of . 


—2 2 1 i cos@ -sin@ 
| ai | HF orgie il 


The characteristic polynomial of the matrix below is  — 4t — 1. Determine the missing 
entries. 

oe 

fom 

| ee ee 


What complex numbers might be eigenvalues of a linear operator T such that 
(a) T’ =, (b) T* —5T +6] =0? 
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5.4. 


5.5. 


5.6. 


Dele 
5.8. 


519: 


5.10. 
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Find a recursive relation for the characteristic polynomial of the kx k matrix 
OF wl 
fe 
ee 
1 
demeO 


and compute the polynomial for k < 5. 


Which real 22 matrices have real eigenvalues? Prove that the eigenvalues are real if the 
off-diagonal entries have the same sign. 


Let V be a vector space with basis (vo, ..., Un) andletap, ..., An be scalars. Define a linear 
operator T on V by the rules T(v;) = vj41 ifi <n and T(vn) = agvo + iV, +--+ +n Un. 
Determine the matrix of T with respect to the given basis, and the characteristic polynomial 
orl: 

Do A and A' have the same eigenvectors? the same eigenvalues? 


Let A = (a;;) be a 3X3 matrix. Prove that the coefficient of ¢ in the characteristic 
polynomial is the sum of the symmetric 2 X2 minors 


det} 711, 712 Veidewjittt 913 laden? 223ia 
a21 422 a31 433 a32, 33 


Consider the linear operator of left multiplication by an mXm matrix A on the space 
F™*™ of allm Xm matrices. Determine the trace and the determinant of this operator. 


Let A and B ben Xn matrices. Determine the trace and the determinant of the operator 
on the space F”*” defined by M ~» AMB. 


Section6 Triangular and Diagonal Forms 


6.1. 


6.2. 


6.3. 


6.4. 


6.5. 


Let A be an nXn matrix whose characteristic polynomial factors into linear factors: 
P( = (t—A4)---@—An). Prove that trace A =A, +---+Apy, that detA =A,---Apy. 


Suppose that a complex n Xn matrix A has distinct eigenvalues A,,...,A,, and let 
U1, ..., Un be eigenvectors with these eigenvalues. 


(a) Show that every eigenvector is a multiple of one of the vectors 1;. 
(b) Show how one can recover the matrix from the eigenvalues and eigenvectors. 


Let T be a linear operator that has two linearly independent eigenvectors with the same 
eigenvalue 4. Prove that A is a multiple root of the characteristic polynomial of T. 


Let A = k a Find a matrix P such that P"!AP is diagonal, and find a formula for the 


matrix A°°. 


In each case, find a complex matrix P such that P-'AP is diagonal. 


; 001 
fh op cos@ -sin@ 
E a le), ; ; ; : Ore oy 


6.6. 


6.7. 
6.8. 


6.9. 


6.10. 


6.11. 
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Suppose that A is diagonalizable. Can the diagonalization be done with a matrix P in the 
special linear group? 


Prove that if A and B are n Xn matrices and A is nonsingular, then AB is similar to BA. 


A linear operator T is nilpotent if some positive power T* is zero. Prove that T is nilpotent 
if and only if there is a basis of V such that the matrix of TJ is upper triangular, with 
diagonal entries zero. 


Find all real 2X2 matrices such that A? = J, and describe geometrically the way they 
operate by left multiplication on R?. 


Let M be a matrix made up of two diagonal blocks: M = a . Prove that M is 
diagonalizable if and only if A and D are diagonalizable. 
LetA = i¢ | be a 2X2 matrix with eigenvalue A. = 4a 


(a) Show that unless it is zero, the vector (b, A — a)' is an eigenvector. 
(b) Find a matrix P such that P-!AP is diagonal, assuming that b+ 0 and that A has distinct 


eigenvalues. 


Section 7 Jordan Form 


7.26 


0 
7.1. Determine the Jordan form of the matrix | 0 1 O 
OP it 
| ra 
Prove that A = | -1 -1 -1 | is an idempotent matrix, 1.e., that A2 = A, and find its 
ile ree i 


UAE 


7.4. 


Ts: 


7.6. 


AE 
7.0. 
dish 


Jordan form. 


Let V be a complex vector space of dimension 5, and let T be a linear operator on V 
whose characteristic polynomial is (t — A)°. Suppose that the rank of the operator T — AJ 
is 2. What are the possible Jordan forms for T? 


(a) Determine all ail Jordan forms for a matrix whose characteristic polynomial is 
(¢--2)7@—5)°. 

(b) What are the possible Jordan forms for a matrix whose characteristic polynomial is 
(t + 2)?(t — 5)3, when space of eigenvectors with eigenvalue —2 is one-dimensional, 
and the space of eigenvectors with eigenvalue 5 is two-dimensional? 

What is the Jordan form of a matrix A all of whose eigenvectors are multiples of a single 

vector? 

Determine all invariant subspaces of a linear operator whose Jordan form consists of one 

block. 

Is every complex square matrix A such that A? = A diagonalizable? 

Is every complex square matrix A similar to its transpose? 


Find a 2X2 matrix with entries in F, that has a power equal to the identity and an 
eigenvalue in F p, but is not diagonalizable. 
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Miscellaneous Problems 


Mil. 


M.2. 


M.3. 


M.4. 


“VES, 


Let v = (a,...,@n) be a real row vector. We may form the n! Xn matrix M whose TOWS 
are obtained by permuting the entries of v in all possible ways. The rows can be listed in 
an arbitrary order. Thus if nm = 3, M might be 


a a2 a3 
Qa, a3 a2 
a2 a3 ay 
az ay a 
a3 aj a2 
a3 a2 aj, 


Determine the possible ranks that such a matrix could have. 


Let A bea complex nXn matrix with n distinct eigenvalues A1,...,A,. Assume that A, 
is the largest eigenvalue, that is, that |A,| > |A;| for alli > 1. 


(a) Prove that for most vectors X, the sequence X, = payer 2 Ne converges to an 
eigenvector Y with eigenvalue A,, and describe precisely what the conditions on X 
are for this to be true. 


(b) Prove the same thing without assuming that the eigenvalues A1, ..., A, are distinct. 


Compute the largest eigenvalue of the matrix la +h to three-place accuracy, using a 


method based on Exercise M.2. 


If X = (x1, x2, ...) is an infinite real row vector and A = (a;;),0 <i, j < oo is an infinite 
real matrix, one may or may not be able to define the matrix product XA. For which A can 
one define right multiplication on the space R® of all infinite row vectors (3.7.1)? on the 
space Z (3.7.2)? 


Let g: F"” > F” be left multiplication by an m Xn matrix A. 


(a) Prove that the following are equivalent: 
e A has aright inverse, a matrix B such that AB = 7, 
¢ @ is surjective, 
e the rank of A is m. 

(b) Prove that the following are equivalent: 
¢ A has a left inverse, a matrix B such that BA = J, 
© @is injective, 
e the rank of A isn. 


M.6. 


*M.7. 


M.8. 


M.9. 


M.10. 
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Without using the characteristic polynomial, prove that a linear operator on a vector space 
of dimension n can have at most n distinct eigenvalues. 


(powers of an operator) Let T be a linear operator on a vector space V. Let K, and W, 
denote the kernel and image, respectively, of T”. 


(a) Show that Ki C K2 C.--: andthat W; > W22>..--. 
(b) The following conditions might or might not hold for a particular value of r: 
(1) Kr = Kyy1, (2) We = Writ, 3) WN Ki =(0}, 4)Wi+K-=V. 
Find all implications among the conditions (1)—(4) when V is finite-dimensional. 
(c) Do the same thing when V is infinite-dimensional. 


Let T be a linear operator on a finite-dimensional complex vector space V. 


(a) Let A be an eigenvalue of 7, and let V,, be the set of generalized eigenvectors, together 
with the zero vector. Prove that V) is a T-invariant subspace of V. (This subspace is 
called a generalized eigenspace.) 


(b) Prove that V is the direct sum of its generalized eigenspaces. 


Let V be a finite-dimensional vector space. A linear operator T: V > V is called a 
projection if T? = T (not necessarily an “orthogonal projection”). Let K and W be the 
kernel and image of a linear operator T. Prove 


(a) T is a projection onto W if and only if the restriction of T to W is the identity map. 


(b) If T is a projection, then V is the direct sum W © K. 
(c) The trace of a projection T is equal to its rank. 


Let A and B be m Xn and n X™m real matrices. 


(a) Prove that if A is a nonzero eigenvalue of the mXm matrix AB then it is also an 
eigenvalue of the n Xn matrix BA. Show by example that this need not be true if 
7s 

(b) Prove that J», — AB is invertible if and only if J, — BA is invertible. 


CHAPTER “S 


Applications of Linear Operators 


By relieving the brain from all unnecessary work, 
a good notation sets it free to concentrate 
on more advanced problems. 


—Alfred North Whitehead 


5.1 ORTHOGONAL MATRICES AND ROTATIONS 


In this section, the field of scalars is the real number field. 
We assume familiarity with the dot product of vectors in R?. The dot product of column 
vectors X = (x1,...,Xn)', Y = (1, ---, yn)' in R” is defined to be 


(Sat) (X-Y) = xyy, +++-4+XnYn. 


It is convenient to write the dot product as the matrix product of a row vector and a column 
vector: 


(5.1.2) XY) Sexy’. 
For vectors in R?, one has the formula 
(Sals3) (X.Y) =|X||Y| cos, 
where @ is the angle between the vectors. This formula follows from the law of cosines 
(5.1.4) : c’ =a’ +b’ — 2abcosé 


for the side lengths a, b, c of a triangle, where @ is the angle between the sides a and b. 
To derive (5.1.3), we apply the law of cosines to the triangle with vertices 0, X, Y. Its side 
Jengths are |X|, |Y|, and |X — Y|, so the law of cosines can be written as 


(X =V) = -Y)) = (X- e- Y) Sai eae 


The left side expands to (X -X) — 2(X - Y) +(Y-Y), and formula (5.1.3) is obtained by 
comparing this with the right side. The formula is valid for vectors in R” too, but it requires 
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understanding the meaning of the angle, and we won’t take the time to go into that just now 
(see (8.5.2)). 


The most important points for vectors in R* and R? are 


e the square |X|? of the length of a vector X is (X -X) = X‘X, and 
e B Becta X is orthogonal to another vector Y, written X 1 Y, if and only if 
x Yaa 


We take these as the definitions of the length |X| of a vector and of orthogonality of 
vectors in R”. Note that the length |X| is positive unless X is the zero vector, because 
|X|? = Xtx = x? +...4 x2 is a sum of squares. 


Theorem 5.1.5 Pythagoras. If X | Y and Z = X + Y, then |Z/? = |X|? + |Y(?. 
This is proved by expanding Z'Z. If X | Y, then XtY = Yt¥ =0,so 
ZZ = (Xt V)J(X+Y) = MxX4+xXY+yyre+y'y = xx+yty. O 


We switch to our lowercase vector notation. If vj, ..., vg are orthogonal vectors in R” 
and if w = v; +--- + vg, then Pythagoras’s theorem shows by induction that 


(5.1.6) Jw]? = fry? +++ + fvgl?. 
Lemma 5.1.7 Any set (v1, ..., vz) of orthogonal nonzero vectors in R” is independent. 


Proof. Let w = c1v1 +--+-+c,v,x be a linear combination, where not all c; are zero, and let 
w; = cjv;. Then w is the sum w1 + ---+ wz of orthogonal vectors, not all of which are zero. 
By Pythagoras, |w|? = |w1|? + ---+ |w,|? > 0,so w 40. O 


e Anorthonormal basis B = (v1, ..., Un) of R” is a basis of orthogonal unit vectors (vectors 
of length one). Another way to say this is that B is an orthonormal basis if 


(5.1.8) (vu; - vj) = bij, 

where 6; j, the Kronecker delta, is the i, j-entry of the identity matrix, which is equal to 1 if 
m= j and to 0 if 1+ j. 

Definition 5.1.9 A real n Xn matrix A is orthogonal if A'A = I, whichis to say, A is invertible 
and its inverse is A‘. 


Lemma 5.1.10 Ann Xn matrix A is orthogonal if and only if its columns form an orthonormal 
basis of R”. 


Proof, Let.A; denote the ith column of A. Then A} is the ith row of A!. The i, j-entry of A'A 
is A'A;, so AtA = J if and only if A}Aj = 4, for all i and j. 0 


The next properties of orthogonal matrices are easy to verify: 
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Proposition 5.1.11 


(a) The product of orthogonal matrices is orthogonal, and the inverse of an orthogonal 
matrix, its transpose, is orthogonal. The orthogonal matrices form a subgroup O, of 
GLy, the orthogonal group. 

(b) The determinant of an orthogonal matrix is + 1. The orthogonal matrices with determi- 
nant 1 form a subgroup SO,, of O, of index 2, the special orthogonal group. O 


Definition 5.1.12 An orthogonal operator T on R" is a linear operator that preserves the dot 
product: For every pair X, Y of vectors, 


(TX TY) = Ca.y)- 


Proposition 5.1.13 A linear operator T on R” is orthogonal if and only if it preserves lengths 
of vectors, or, if and only if for every vector X, (TX -TX) = (X -X). 


Proof. Suppose that lengths are preserved, and let X and Y be arbitrary vectors in R”. 
Then 
(TX + Y)-T(X+ Y)) = (X+Y)-QX+Y)). 


The fact that (TX . TY) = (X - Y) follows by expanding the two sides of this equality and 
cancelling. O 


Proposition 5.1.14 A linear operator T on R” is orthogonal if and only if its matrix A with 
respect to the standard basis is an orthogonal matrix. 


Proof. If A is the matrix of 7, then 
(TX. TY) = (AX)'(AY) = X'(ATA)Y. 


The operator is orthogonal if and only if the right side is equal to X'Y for all X and Y. We 
can write this condition as X'(A'A — J) Y = 0. The next lemma shows that this is true if and 
only if A‘A — J = 0, and therefore A is orthogonal. O 


Lemma 5.1.15 Let M be ann Xn matrix. If X'MY = 0 for all column vectors X and Y, then 
Me 0. 


Proof. The product e} Me; evaluates to the i, j-entry of M. For instance, 


my my2{)1]_ 
ie 1] me | 6 |= mar: 


If e}Me; = 0 for alli and j, then M =O. i 0 
We now describe the orthogonal 2 x2 matrices. 


¢ A linear operator T on R? is a reflection if it has orthogonal eigenvectors v and v2 with 
eigenvalues 1 and -1, respectively. 
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Because it fixes v; and changes the sign of the orthogonal vector v2, such an operator 
reflects the plane about the one-dimensional subspace spanned by v;. Reflection about the 
€1-axis is given by the matrix 


(5.1.16) se E i 


Theorem 5.1.17 


(a) The orthogonal 22 matrices with determinant 1 are the matrices 


(5.1.18) R= is s| 


Ss c 


with c = cos@ and s = sin@, for some angle 9. The matrix R represents counterclockwise 
rotation of the plane R? about the origin and through the angle 0. 


(b) The orthogonal 2 x2 matrices A with determinant -1 are the matrices 
(5.1.19) S= he | 4 = RS 


Ss -C 


with c and s as above. The matrix S reflects the plane about the one-dimensional 
subspace of R? that makes an angle 50 with the e;-axis. 


Po k ‘| 
Se 
is orthogonal. Then its columns are unit vectors (5.1.10), so the point (c, s)' lies on the unit 


circle, and c = cos@ and s = sin, for some angle 0. We inspect the product P = R'A, where 
R is the matrix (5.1.18): : 


Proof. Say that 


1 x 

— hy 

1.20) |, al 

Since R' and A are orthogonal, so is P. Lemma 5.1.10 tells us that the second column is a unit 
vector orthogonal to the first one. So 


eat 
(5.1.21) , Palo a 


Working back, A = RP,so A = Rif detA = 1 and A =S = RSo i detA = -1, 

We’ve seen that R represents a rotation (4.2.2), but we must still identify the operator 
defined by the matrix S. The characteristic polynomial of S$ is t? — 1, so its eigenvalues are 
1 and -1. Let X; and X> be unit-length eigenvectors with these eigenvalues. Because S is 


orthogonal, 
(X, -X2) = (SX1 + SX2) = (X1 --X2) = -(X1 + X2). 
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It follows that (X, - X2) = 0. The eigenvectors are orthogonal. The span of X, will be the 
line of reflection. To determine this line, we write a unit vector X as (c’, s’)', with c’ = cosa@ 
and s’ = sina. Then 


_|cc’+ss’ | _ | cos(O— a) 
es Be _ | 7 ‘ea is ' 
When a@ = 50, X is an eigenvector with eigenvalue 1, a fixed vector. O 


We describe the 3 X3 rotation matrices next. 


Definition 5.1.22 A rotation of R° about the origin is a linear operator p with these 
properties: 

e po fixes a unit vector u, called a pole of p, and 

e p rotates the two-dimensional subspace W orthogonal to u. 


The axis of rotation is the line £ spanned by u. We also call the identity operator a rotation, 
though its axis is indeterminate. 
If multiplication by a 3X3 matrix R is a rotation of R>, Ris called a rotation matrix. 


6 


(5.1.23) A Rotation of R?. 


The sign of the angle of rotation depends on how the subspace W is oriented. We’ll orient 
W looking at it from the head of the arrow u. The angle @ shown in the figure is positive. 
(This is the “right hand rule.’’) 

When u is the vector e;, the set (€2, €3) will be a basis for W, and the matrix of ¢ will 
have the form 


i) 
(5.1.24) M=aatirc =s ie 
0 ‘see 


where the bottom right 2 x2 minor is the rotation matrix (5.1.18). 


e A rotation that is not the identity is described by the pair (u, 0), called a spin, that consists 
of a pole u and a nonzero angle of rotation 0. 


The rotation with spin (u, @) may be denoted by p:,,9). Every rotation p different 
from the identity has two poles, the intersections of the axis of rotation @ with the unit sphere 
in R3. These are the unit-length eigenvectors of p with eigenvalue 1. The choice of a pole 
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u defines a direction on £, and a change of direction causes a change of sign in the angle 
of rotation. If (u, @) is a spin of ¢, so is (-u, -9). Thus every rotation has two spins, and 
P(u,0) = P(-u,-6)- 


Theorem 5.1.25 Euler’s Theorem. The 3X3 rotation matrices are the orthogonal 3X3 
matrices with determinant 1, the elements of the special orthogonal group SO3. 


Euler’s Theorem has a remarkable consequence, which follows from the fact that 5O3 is a 
group. It is not obvious, either algebraically or geometrically. 


Corollary 5.1.26 The composition of rotations about any two axes is a rotation about some 
other axis. 0 


Because their elements represent rotations, the groups SO and SO3 are called the 
two- and three-dimensional rotation groups. Things become more complicated in dimension 
greater than 3. The 4X4 matrix 


cos@ -sina 
5.1.27 sina cosa 
( ) cosB -sinB 


sinB cosB 


is an element of SO4. Left multiplication by this matrix rotates the two-dimensional subspace 
spanned by (e;, é2) through the angle a, and it rotates the subspace spanned by (e3, e4) 
through the angle £. 

Before beginning the proof of Euler’s Theorem, we note two more consequences: 


Corollary 5.1.28 Let M be the matrix in SO3 that represents the rotation ((,~) with 
spin (u, a). 
(a) The trace of Mis1+2cosa. 


(b) Let B be another element of SO3, and let u’ = Bu. The conjugate M’ = BMB' represents 
the rotation P(,,~) with spin (u’, a). , 


Proof. (a) We choose an orthonormal basis (v1, v2, v3) of R? such that v; = uv. The matrix 
of ¢ with respect to this new basis will have the form (5.1.24), and its trace will be 1+ 2 cosa. 
Since the trace doesn’t depend on the basis, the trace of M is 1 + 2.cosa@ too. 


(b) Since SO3 is a group, M’ is an element of SO3. Euler’s Theorem tells us that M’ is a 
rotation matrix. Moreover, u’ is a pole of this rotation: Since B is orthogonal, u’ = Bu has 
length 1, and 

M'u’ = BMB"'u' = BMu = Bu =u". 


Let a’ be the angle of rotation of M’ about the pole uw’. The traces of M and its conjugate 
M’ are equal, so cosa = cosa’, This implies that a’ = +a. Euler’s Theorem tells us that 
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the matrix B also represents a rotation, say with angle 6 about some pole. Since B and M' 
depend continuously on £, only one of the two values +a for a’ can occur. When 6 = 0, 
B =I, M’ =M, anda’ =a. Therefore a’ = a@ for all B. oO 


Lemma 5.1.29 A 3X3 orthogonal matrix M with determinant 1 has an eigenvalue 
equal to 1. 


Proof. To show that 1 is an eigenvalue, we show that the determinant of the matrix M —/ 
is zero. If Bis ann Xn matrix, det (-B) = (-1)"det B. We are dealing with 3 x3 matrices, so 
det (M — I) =-det (J — M). Also, det (M — I)’ = det (M — J) and det M = 1. Then 


det (M — 1) = det (M — I)’ = det M det (M — 1)’ = det (M(M' — 1)) = det (I — M). 
The relation det (M — J) = det (J — M) shows that det (M@ — J) = 0. C] 


Proof of Euler’s Theorem. Suppose that M represents a rotation p with spin (u,a@). We 
form an orthonormal basis B of V by appending to u an orthonormal basis of its orthogonal 
space W. The matrix M’ of e with respect to this basis will have the form (5.1.24), which 
is orthogonal and has determinant 1. Moreover, M = PM’P™ 1 where the matrix P is equal 
to [B]} (3.5.13). Since its columns are orthonormal, [B] is orthogonal. Therefore M is also 
orthogonal, and its determinant is equal to 1. 


Conversely, let M be an orthogonal matrix with determinant 1, and let T denote left 
multiplication by M. Let u be a unit-length eigenvector with eigenvalue 1, and let W be the 
two-dimensional space orthogonal to u. Since T is an orthogonal operator that fixes u, it 
sends W to itself. So W is a T-invariant subspace, and we can restrict the operator to W. 

Since T is orthogonal, it preserves lengths (5.1.13), so its restriction to W is orthogonal 
too. Now W has dimension 2, and we know the orthogonal operators in dimension 2: they are 
the rotations and the reflections (5.1.17). The reflections are operators with determinant -1. 
If an operator T acts on W as a reflection and fixes the orthogonal vector u, its determinant 
will be -1 too. Since this is not the case, T7|w is a rotation. This verifies the second condition 
of Definition 5.1.22, and shows that T is a rotation. * 


5.2 USING CONTINUITY 


Various facts about complex matrices can be deduced by diagonalization, using reasoning 
based on continuity that we explain here. 


A sequence A, of n Xn matrices converges to ann Xn matrix A if for every i and j, the 
i, j-entry of Ay converges to the i, j entry of A. Similarly, a sequence p;(t),k =1,2,..., of 
polynomials of degree n with complex coefficients converges to a polynomial p(t) of geHee 
n if for every j, the coefficient of r/ in pz converges to the corresponding coefficient of p. We 


may indicate that a sequence S, of complex numbers, matrices, or polynomials converges to 
S by writing S$; > S. 


Proposition 5.2.1 Continuity of Roots. Let p,(f) be a sequence of monic polynomials of 
degree < n, and let p(t) be another monic polynomial of degree n. Let Or 1,-.. ej wanal 
Q@1,...@,, denote the roots of these polynomials. 
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(a) Ifa, —> ay forv=1,...,n, then py — p. 


(b) Conversely, if py — p, the roots a, of px can be numbered in such a way that 
peop foneachw = linn: 


In part (b), the roots of each polynomial p; must be renumbered individually. 


Proof. We note that pz(t) = (t — ay, 1)---(t —azy) and p(t) = (t— a) --- (t — ap). Part 
(a) follows from the fact that the coefficients of p(t) are continuous functions — polynomial 
functions — of the roots, but (b) is less obvious. 


Step 1: Let ag, be a root of pz nearest to a, i.e., such that |,» — | is minimal. We 
renumber the roots of p, so that this root becomes az 1. Then 


lary — ate 1 |" < [(@1 — %1)--- (@1 — OK n)| = 1px(or)). 


The right side converges to | p(@;)| = 0. Therefore the left side does too, and this shows that 
Ok,1 —> &y. 


Step 2: We divide, writing px(t) = (t — ax,1)qx (0) and p(t) = (t — a1)q(0). Then gy and 
q are monic polynomials, and their roots are a7, ..., Ak, and @2,..., @p,, respectively. 
If we show that q; — q, then by induction on the degree n, we will be able to arrange the 
roots of g; so that they converge to the roots of g, and we will be done. 

To show that gz — q, we carry the division out explicitly. To simplify notation, 
we drop the subscript 1 from a 1. Say that p(t) = ft” + Gane ei ope 
q(t) = "1! + by,_2t”-? +... + byt + bo, and that the notation for pz and gx is analogous. 
The equation p(t) = (t — a)q(t) implies that 


bn2 = @+@4y-1, 


bn_3 = a? + Aan—-1 + An-2, 


bo = a+ 0" a,_1+---+aa2 +a). 
Since a%.1 > a and ax; > qj, it is true that by; > Bj. 0 


Proposition 5.2.2, Let A be an Xn complex matrix. 


(a) There is a sequence of matrices Ay that converges to A, and such that for all k the 
characteristic polynomial p;(t) of Ax has distinct roots. 

(b) If a sequence A, of matrices converges to A, the sequence p;(t) of its characteristic 
polynomials converges to the characteristic polynomial p(t) of A. 

(c) Let A; be the roots of the characteristic polynomial p. If A, > A, the roots Ax; of px 
can be numbered so that Ax; — A; for each 1. 


Proof. (a) Proposition 4.6.1 tells us that there is an invertible m Xn matrix P such that 
A’ = P™'AP is upper triangular. Its eigenvalues will be the diagonal entries of that matrix. 
We let A, be a sequence of matrices that converges to A’, whose off-diagonal entries are the 
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same as those of A’, and whose diagonal entries are distinct. Then A; is upper triangular, and 
its characteristic polynomial has distinct roots. Let Ay, = PA;,P~ 1 Since matrix multiplication 
is continuous, Ay — A. The characteristic polynomial of A; is the same as that of A‘, so it 
has distinct roots. 


Part (b) follows from (a) because the coefficients of the characteristic polynomial depend 
continuously on the matrix entries, and then (c) follows from Proposition 5.2.1. O 


One can use continuity to prove the famous Cayley-Hamilton Theorem. We state the 
theorem in its matrix form. 


Theorem 5.2.3 Cayley-Hamilton Theorem. Let p(t) = ¢” + Cyt psec t-epbeahte 
characteristic polynomial of an m Xn complex matrix A. Then p(A) = A” + nA 
-+++¢,A+ Col is the zero matrix. 


For example, the characteristic polynomial of the 22 matrix A, with entries a, b, c,d 
as usual, is 2? — (a+ d)t + (ad — bc) (4.5.12). The theorem asserts that 


(5.2.4) [: | ~(a+d)|¢ | + (ad be) | j ] - E ae 


This is easy to verify. 


Proof of the Cayley-Hamilton Theorem. Step 1: The case that A is a diagonal matrix. 
Let the diagonal entries be Aj, ..., An. The characteristic polynomial is 


P() = (f—Aq)-+-@— An). 


Here p(A) is also a diagonal matrix, and its diagonal entries are p(A;). Since A; are the 
roots of p, p(A;) = 0 and p(A) = 0. 


Step 2: The case that the eigenvalues of A are distinct. 
In this case, A is diagonalizable; say A’ = P-'AP is diagonal. Then the characteristic 
polynomial of A’ is the same as the characteristic polynomial p(t) of A, and moreover, 


p(A) = Pp(A’))P! 
(see (4.6.14)). By step 1, p(A’) = 0, so p(A) = 0. 


Step 3: The general case. 

We apply proposition 5.2.2. We let A, be a sequence of matrices with distinct 
eigenvalues that converges to A. Let px be the characteristic polynomial of A;. Since the 
sequence p,; converges to the characteristic ae p of A pe(Ax) > pA). Step 2 
tells us that py (Ax) = 0 for all k. Therefore p(A) = O 
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5.3. SYSTEMS OF DIFFERENTIAL EQUATIONS 


We learn in calculus that the solutions of the differential equation 


dx 


Sem 
(5.3.1) ai 


=X 
are x(t) = ce“, where c is an arbitrary real number. We review the proof because we want 
to use the argument again. First, ce*’ does solve the equation. To show that every solution 


has this form, let x(t) be an arbitrary solution. We differentiate e-?x(t) using the product 
rule: 


ae . m 
(5.32) ae x(t) = (-ae™™) x(t) + 7“ (ax() = 0. 
Thus e~*%x(t) is a constant c, and x(t) = ce“. 
To extend this solution to systems of constant coefficient differential equations, we use 
the following terminology. A vector-valued function or matrix-valued function is a vector or 
a matrix whose entries are functions of f: 


x1(t) ay(t) --- aint) 
(5.3.3) X=} : |, AM= 


xn(0) BAG) Gan) 


The calculus operations of taking limits and differentiating are extended to vector- 
valued and matrix-valued functions by performing the operations on each entry separately. 
The derivative of a vector-valued or matrix-valued function is the function obtained by 
differentiating each entry: 


a x, (@ mi) vee “in (t) 
(5.3.4) See a Whee | a 
ae) 2) OMe aan) 


where x;’(f) is the derivative of x;(t), and so on. So ax is defined if and only if each of the 
functions x;(t) is differentiable. The derivative can also be described in matrix notation: 


dX . X(t+h)-XO® 
(5.35) ‘dt oor he h 


Here X(t + h) — X(t) is computed by vector addition and the h in the denominator stands 
for scalar multiplication by h7}. The limit is obtained by evaluating the limit of each entry 
separately. So the entries of (5.3.5) are the derivatives x/(t). The analogous statement is true 
for matrix-valued functions. 
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Many elementary properties of differentiation carry over to matrix-valued functions. 
The product rule, whose proof is an exercise, is an example: 


Lemma 5.3.6 Product Rule. 


(a) Let A(t) and B(t) be differentiable matrix-valued functions of f¢, of suitable sizes so 
that their product is defined. Then the matrix product A(t)B(¢) is differentiable, and its 


derivative is 


d(AB) dA dB 
= —B+A—. 
dt dt . dt 
(b) Let A,,..., Ax be differentiable matrix-valued functions of tf, of suitable sizes so that 


their product is defined. Then the matrix product A;---A, is differentiable, and its 


derivative is ‘ 
d dA; 
gp A Da At Ainge Aen < 


A system of homogeneous linear, first-order, constant-coefficient differential equations 
is a matrix equation of the form 
dx 
a 
where A is a constant n Xn matrix and X(t) is an n-dimensional vector-valued function. 
Writing out such a system, we obtain a system of n differential equations 


(5.3.7) Ax, 


dx 
en ayy te Oat) 
(5.3.8) : 
dx 
eo =4ni%1(t) + +++ + AnnXn(t). 
The x;(t) are unknown functions, and the scalars a;; are given. For example, if 
5.3.9) , meee? 
(SR A Fr A , 
(5.3.7) becomes a system of two equations in two unknowns: 
d 
— = 3X) 2x? 
(5.3.10) : 
A pn x1+4 
dt = 1 XQ. 


The simplest systems a,*e those in which A is a diagonal matrix with diagonal entries 
A;. Then equation (5.3.8) reads 


(5.3.11) aX: -Ajxj(t), i=1,...,n. 
dt 
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Here the unknown functions x; are not mixed up by the equations, so we can solve for each 
one separately: 


(5.3.12) eit 


! 


for some arbitrary constants c;. 
' The observation that allows us to solve the differential equation (5.3.7) in many cases 
is this: If V is an eigenvector for A with eigenvalue A, i.e., if AV = AV, then 


(5.3.13) X=e"y 
is a particular solution of (5.3.7). Here e*“V must be interpreted as the product of the 
variable scalar e*’ and the constant vector V. Differentiation operates on the scalar function, 


fixing V, while multiplication by A operates on the vector V, fixing the scalar e*”. Thus 
gent V = Ae*V and also Ae*’V = Ae*'V. For example, 


open | 


are eigenvectors of the matrix (5.3.9), with eigenvalue 5 and 2, respectively, and 
St oe 
e 2e 


solve the system (5.3.10). 


This observation allows us to solve (5.3.7) whenever the matrix A has distinct real 
eigenvalues. In that case every solution will be a linear combination of the special solutions 
(5.3.13). To work this out, it is convenient to diagonalize. 


Proposition 5.3.15 Let A be an m Xn matrix, and let P be an invertible matrix such that 


A = P"!APis diagonal, with diagonal entries A;, ... , An. The general solution of the system 
ax = AX is X = PX, where X = (cje*"", ..., cne*”")! solves the equation & = AX. 


The coefficients c; are arbitrary. They are often determined by assigning initial condi- 
tions — the value of X at some particular fo. 


Proof. We multiply the equation am _ AX Bye Pe P= = PAX = APX. But since P is 


dt 
constant, p& = ae) = ax Thus ce = AX. This reasoning can be reversed, so X solves 
the equation with A if and only if X solves the equation with A. O 


The matrix that diagonalizes the matrix (5.3.10) was computed before (4.6.8): 


(5.3.16) ae Al eal il aad he Alt 
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Thus 
, wade | scene | 1 29) egena) .. ere eae 
oe) es | = ne | | rilne ~ | ce! — cet |" 


In other words, every solution is a linear combination of the two basic solutions (5.3.14). 


We now consider the case that the coefficient matrix A has distinct eigenvalues, but 
that they are not all real. To copy the method used above, we first consider differential 
equations of the form (5.3.1), in which a is a complex number. Properly interpreted, the 
solutions of such a differential equation still have the form ce“. The only thing to remember 
is that e“’ will now be a complex-valued function of the real variable ¢. 

The definition of the derivative of a complex-valued function is the same as for real- 
valued functions, provided that the limit (5.3.5) exists. There are no new features. We can 
write any such function x(f) in terms of its real and imaginary parts, which will be real-valued 
functions, say 


(5.3.18) x(t) = p(t) +iq@®. 


Then + is differentiable if and only if p and q are differentiable, and if they are, the derivative 
of xis p’ + iq’. This follows directly from the definition. The usual rules for differentiation, 
such as the product rule, hold for complex-valued functions. These rules can be proved 
either by applying the corresponding theorem for real functions to p and q, or by copying 
the proof for real functions. 

The exponential of a complex number a = r + si is defined to be 


(5.3.19) e? = et! — e’ (coss + isins). 


Differentiation of this formula shows that de“'/dt = ae“. Therefore ce® solves the 
differential equation (5.3.1), and the proof given at the beginning of the section shows that 
these are the only solutions. 

Having extended the case of one equation to complex coefficients, we can use diago- 
nalization to solve a system of equations (5.3.7) when A is a complex matrix with distinct 
eigenvalues. 


For example, let A = | : ak Prevectors'v = E and v2 = I 


-l 1 i il 
with eigenvalues | + 7 and 1 — 1, respectively. Let B denote the basis (v1, v2). Then A is 
diagonalized by the matrix P = [B]: 


= 1] 1 -i Yi ie et 
530 lap =~ = =A 


are eigenvectors, 
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* ¢ (1+i)t 
Then X = eo = pe | The solutions of (5.3.7) are 


cneA— —i)t 
321) *1 | _ py — cpeG+)t 4 ic e(l-it 
7 Loy \ ae ~ | icyeGt+ ot + cel-it |> 


where c1, c2 are arbitrary complex numbers. So every solution is a linear combination of the 
two basic solutions 


(1+i)t ie(i-i)t 
e ie 
(5.3.22) | Sasi and | fe) 


However, these solutions aren’t very satisfactory, because we began with a system of 
differential equations with real coefficients, and the answer we obtained is complex. When 
the equation is real, we will want the real solutions. We note the following lemma: 


Lemma 5.3.23 Let A be a real n Xn matrix, and let X(t) be a complex-valued solution of 
the differential equation ax = = AX. The real and imaginary parts of X(f) solve the same 


equation. mi O 


Now every solution of the original equation (5.3.7), whether real or complex, has the 
form (5.3.21) for some complex numbers c;. So the real solutions are among those we have 
found. To write them down explicitly, we may take the real and imaginary parts of the 
complex solutions. 

The real and imaginary parts of the basic solutions (5.3.22) are determined using 
(5.3.19). They are 


t ree 
e’ cost e' sint 
; ; nd : 
Coa!) -e' sin | : e' cost 
Every real solution is a real linear combination of these particular solutions. 


5.4 THE MATRIX EXPONENTIAL 


Systems of first-order linear, constant-coefficient differential equations can be solved for- 


mally, using the matrix exponential. 
The exponential of an nXn real or complex matrix A is the matrix obtained by 


substituting A for x and J for 1 into the Taylor’s series for e* , which is 


2 x3 


AE 
(5.4.1) ie ee) ap 
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Thus by definition, 


Ana 
(5.4.2) ew=l+—4+—4+— 


ia 


We will be interested mainly in the matrix valued function e’4 of the variable scalar f, 
so we substitute tA for A: 


tA fA2 PA? 
ao 


a 

(5.4.3) e* =I+ 7 a y = 31 

Theorem 5.4.4 

(a) The series (5.4.2) converges absolutely and uniformly on bounded sets of complex 
matrices. 


(b) e’4 is a differentiable function of f, and its derivative is the matrix product Ae™. 
(c) Let A and B be complex n Xn matrices that commute: AB = BA. Then e4*8 = e4e?, 


In order not to break up the discussion, we have moved the proof of this theorem to the end 
of the section. 


The hypothesis that A and B commute is essential for carrying the fundamental 
property e**” = e*e” over to matrices. Nevertheless, (c) is very useful. 


Corollary 5.4.5 For any n Xn complex matrix A, the exponential e4 is invertible, and its 
inverse is 64. 


Proof. Because A and —A commute, e4e74 = e4-4 = e® = 1. 0 


Since matrix multiplication is relatively complicated, it is often not easy to write down 
the entries of the matrix e*. They won’t be obtained by exponentiating the entries of A unless 
A is a diagonal matrix. If A is diagonal, with diagonal entries A,,..., An, then inspection of 
the series shows that e4 is also diagonal, and that its diagonal entries are e”. 

The exponential is also fairly easy to compute for a triangular 2X2 matrix. For 


example, if 
taal 
te 
then 


1 1 (eee ie - 
(5.4.6) An ital »|+5 | a|t-- = [¢ # | 


It is a good exercise to calculate the missing entry * directly from the series. 
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ae exponential of e“ can be determined whenever we know a matrix P such that 
A = P ‘AP is diagonal. Using the rule P™! A‘ P = (P™! AP) (4.6.12) and the distributive law 
for matrix multiplication, 


1 14 p)2 
(5.4.7) PleAP= (PUP) + ae ou a =) oe 


Suppose that A is diagonal, with diagonal entries 4;. Then e“ is also diagonal, and its 
diagonal entries are e*’. In this case we can compute e4 explicitly: 


(5.4.8) ef = PeAP}, 


2 1 


somerel ME oll Del 8) 


The next theorem relates the matrix exponential to differential equations: 


For example, if A = | 1 5 | ana p= | {| then Ptap=a=|' , |: se 


Theorem 5.4.9 Let A be areal or complex n Xn matrix. The columns of the matrix e'4 form 
a basis for the space of solutions of the differential equation dn =AX. 


Proof. Theorem 5.4.4(b) shows that the columns of e'A solve the differential equation. To 
show that every solution is a linear combination of the columns, we copy the proof given at 
the beginning of Section 5.3. Let X(¢) be an arbitrary solution. We differentiate the matrix 
product e~4 X(f) using the product rule (5.3.6): 


(5.4.10) - (ex) = (-Ae) X() +4 (AX(D). 


Fortunately, A and e-'4 commute. This follows directly from the definition of the expo- 
nential. So the derivative is zero. Therefore e'4X(t) is a constant column vector, say 


C = (cj,...,¢n)', and X(t) = e'4C. This expresses X(t) as a linear combination of the 
columns of e’4, with coefficients c;. The expression is unique because e’ is an invertible 
matrix. O 


Though the matrix exponential always solves the differential equation (5.3.7), it may 
not be easy to apply in a concrete situation because computation of the exponential can be 
difficult. But if A is diagonalizable, the exponential can be computed as in (5.4.8). We can 
use this method of evaluating e’“ to solve equation (5.3.7). Ot course we will get the same 
solutions as we did before. Thus if A, P, and A are as in (5.3.16), then 


St 2t St 2t 
" Psion OU lia eo le Ce lee) 
2 ae a Ej ll al ( 3 ml aE (eo — e*!) (2e>! +e) , 
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The columns of the matrix on the right form a second basis for the space of solutions 
that was obtained in (5.3.17). 


One can also use Jordan form to solve the differential equation. The solutions for 
an arbitrary kXk Jordan block J, (4.7.5) can be determined by computing the matrix 
exponential. We write J, = AJ + N, as in (4.7.12), where N is the k Xk Jordan block Jo with 
2 = 0. Then Nk = 0, so 


tN tink 
tN ——~ —_— ee —————— 
al t—elemeaes GS 
Since N and AJ commute, 
oe Ml IN ae, Cae ee : 
CY = AEN Se (147+ ae cor) 
Thus if J is the 3 X3 block 
3 
ff pel : 
less 
then 
ext | et 


The columns of this matrix form a basis for the space of solutions of the differential 
equation ax — OG 


We now go back to prove Theorem 5.4.4. The main facts about limits of series that we 
will use are given below, together with references to [Mattuck] and [Rudin]. Those authors 
consider only real valued functions, but the proofs carry over to complex valued functions 
because limits and derivatives of complex valued functions can be defined by working on the 
real and imaginary parts separately. 

If r and s are real numbers with r < s, the notation [7, s] stands for the interval 
Pe bays. 


Theorem 5.4.11 ({Mattuck], Theorem 22.2B, [Rudin], Theorem 7.9). Let m, be a series of 
positive real numbers such that }* m, converges. If uw (f) are functions on an interval [r,s], 
and if |u“*)(t)| < my for all k and all ¢ in the interval, then the series 1. u(t) converges 
uniformly on the interval. taal 


Theorem 5.4.12 ({Mattuck], Theorem 11.5B, [Rudin], Theorem 7.17). Let uz) be a 
sequence of functions with continuous derivatives on an interval [r, s]. Suppose that the 
series )~ u“*) (tr) converges to a function f(t) and also that the series of derivatives yu’) (1) 
converges uniformly to a function g(t), on the interval. Then f is differentiable on the 
interval, and its derivative is g. ae OQ 
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Proof of Theorem 5.4.4(a). We denote the i, j-entry of a matrix A by (A);; here. So (AB)jj 
stands for the entry of the product matrix AB, and (A*); ; for the entry of the kth power A*. 
With this notation, the i, j-entry of e4 is the sum of the series 


oe (ADA 
2 3! 


To prove that the series for the =< converges absolutely and uniformly, we need to 
show that the entries of the powers A* do not grow too quickly. 


(5.4.13) ei = = QD)ij+ a 


We denote by ||A| the maximum absolute value of the entries of a matrix A, the smallest 
real number such that 


(5.4.14) © \(A)ij] < JA for alli, 7. 
Its basic property is this: 


Lemma 5.4.15 Let A and B be complex n Xn matrices. Then |AB| < n||A|| |B], and for all 
k>0, A‘ <n® 145. 


Proof. We estimate the size of the 7, j-entry of AB: 


|(AB)ij| = Se Fy < Se |(A)iv||(B)) | < mA] IBI. 
v=1 
The second inequality follows by induction from the first one. Oo 


We now estimate the exponential series: Let a be a positive real number such that 
n|A| < a. The lemma tells us that (A), < ak (with one 7 to spare). So 


1 it 
(5.4.16) eis < || +|(Aa| + TI a ne a (4%) fuse 
Qa 
<1+— 1! Sm. a ahr 31 ans 
The ratio test shows that the last series converges (to e@ of course). Theorem 5.4.11 shows 
that the series for e4 converges absolutely and uniformly for all A with n||Al| < a. O 


Proof of Theorem 5.4.4(b),(c). We use a trick to shorten the proofs. That is to begin by 
differentiating the series for e4*+?, assuming that A and B are commuting n Xn matrices. 
The derivative of tA + B is A, and 


B) 
wy ai (tA + B) i 
1! 2! 


Using the product rule (5.3.6), we see that, for k > 0, the derivative of the term of degree k 
of this series is 


(5.4.17) ere 


F(A) - (Gi Sea eae 1 4 (tA + By ‘| 
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Since AB = BA, we can pull the A in the middle out to the left: 


a 12) 


pie gle 
dt k! 


k! 7 (k-1)! 


(5.4.18) 


This is the product of the matrix A and the term of degree k — 1 of the exponential series. 
So term-by-term differentiation of (5.4.17) yields the series for Aer? 


To justify term-by-term differentiation, we apply Theorem 5.4.4(a). The theorem shows 
that for given A and B, the exponential series e“4+? converges uniformly on any interval 
r <t <s. Moreover, the series of derivatives converges uniformly to Ae’4*. By Theorem 
5.4.12, the derivative of e'4*8 can be computed term by term, so it is true that 


d 
patie ager 


dt 
for any pair A, B of matrices that commute. Taking B = 0 proves Theorem 5.4.4(b). 


Next, we copy the method used in the proof of Theorem 5.4.9. We differentiate the 
product e“e'4+8 again assuming that A and B commute. As in (5.4.10), we find that 


(eae) = (-4e) es ae eo (4e4+8) =a 


Therefore e“e!4+® — C, where C is a constant matrix. Setting t = 0 shows that e? = C. 
Setting B = 0 shows that e"4 = (e'4)"!. Then (e'4) 'e'4+8 = e8 . Setting t = 1 shows that 
eA+B — eAe® This proves Theorem 5.4.4(c). Oo 


We will use the remarkable properties of the matrix exponential again, in Chapter 9. 


| have not thought it necessary to undertake the labour 
of a formal proof of the theorem in the general case. 


—Arthur Cayley! 


EXERCISES 


Section 1 Orthogonal Matrices and Rotations 


1.1. Determine the matrices that represent the following rotations of R?: 


(a) angle 6, the axis €7, (b) angle 277/3, axis contains the vector (1, 1,1)’, (ce) angle 2/2, 
axis contains the vector (1, 1, 0)’. 


1.2. What are the complex eigenvalues of the matrix A that represents a rotation of R? through 
the angle 0 about a pole u? 


1.3. Is O, isomorphic to the product group SO, X{+J}? 


1.4, Describe geometrically the action of an orthogonal 3 X3 matrix with determinant -1. 


! Arthur Cayley, one of the mathematicians for whom the Cayley-Hamilton Theorem is named, stated that 
theorem for n Xn matrices in one of his papers, and then checked the 2X2 case (see (5.2.4)). He closed his 
discussion of the theorem with the sentence quoted here. 
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1.5. Let A be a3 X3 orthogonal matrix with det A = 1, whose angle of rotation is different from 
0 or z, and let M=A-— A’. 


(a) Show that M has rank 2, and that a nonzero vector X in the nullspace of M is an 
eigenvector of A with eigenvalue 1. 


(b) Find such an eigenvector explicitly in terms of the entries of the matrix A. 


Section 2 Using Continuity 


2.1. Use the Cayley-Hamilton Theorem to express A7! in terms of A, (det A)~!, and the 
coefficients of the characteristic polynomial. Verify your expression in the 2 x2 case. 


2.2. Let A be mXm and B be nXn complex matrices, and consider the linear operator T on 
the space C”*” of all m Xn complex matrices defined by T(M) = AMB. 


(a) Show how to construct an eigenvector for T out of a pair of column vectors X, Y, where 
X is an eigenvector for A and Y is an eigenvector for B'. 


(b) Determine the eigenvalues of 7 in terms of those of A and B. 
(c) Determine the trace of this operator. 
2.3. Let A be ann Xn complex matrix. 


(a) Consider the linear operator T defined on the space C”*” of all complex n Xn matrices 
by the rule 7(M) = AM — MA. Prove that the rank of this operator is at most n? — n. 


(b) Determine the eigenvalues of 7 in terms of the eigenvalues A1,..., A, of A. 
2.4. Let A and B be diagonalizable complex matrices. Prove that there is an invertible matrix P 
such that P-!AP and P"! BP are both diagonal if and only if AB = BA. 
Section3 Systems of Differential Equations 


3.1. Prove the product rule for differentiation of matrix-valued functions. 
3.2. Let A(t) and B(f) be differentiable matrix-valued functions of t. Compute 


ere 
(a) (A), (b) £ (aio), () Z(4@'B(). 


3.3. Solve the equation ch = AX for the following matrices A: 


2 1 1 ij ‘ome; ae} 0 0 1 
(a) k |.) g |. Geo a ayeyet 00"). 
: 0 0 -1 inal 36) 
3.4. Let A and B be constant matrices, with A invertible. Solve the inhomogeneous differential 


en 
equation = = AX + B in terms of the solutions to the equation a os AX. 


Section 4 The Matrix Exponential 
4.1, Compute e4 for the following matrices A: 


a b| wm [27% 271] @ [9 ~2 @ | | (e) i . 
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4.2. 
4.3. 


4.4. 


4.5. 


4.6. 


4.7. 


4.8. 
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Prove the formula e'™°° 4 = det (e4). 
Let X be an eigenvector of an Xn matrix A, with eigenvalue A. 


(a) Prove that if A is invertible then X is an eigenvector for A~!, with eigenvalue A~!. 


(b) Prove that X is an eigenvector for e4, with eigenvalue e. 


Let A and B be commuting matrices. To prove that e4+® = e4e?, one can begin by 
expanding the two sides into double sums whose terms are multiples of A’B/. Prove that 
the two double sums one obtains are the same. 


dX . , 
Solve the differential equation a= AX when A is the given matrix: 


i 
os 0 0 
(a) » (b) »@}1 1 
a? 1 0 
Tal 
For an n Xn matrix A, define sin A and cosA by using the Taylor’s series expansions for 


sin x and cos x. 


(a) Prove that these series converge for all A. 
(b) Prove that sin(tA) is a differentiable function of t and that & sin(tA) = Acos(tA). 


Discuss the range of validity of the following identities: 


(a) cos? A +sin? A = I, 
(b) e'4 =cosA+isinA, 
(c) sin(A + B) = sinAcosB+cosA sin B, 
(d) e2mia = '§ 

d(cA® 
(e) de) 


dA : ’ ; ; 
a = A® aa when A(f) is a differentiable matrix-valued function of t. 


Let P, By, and B be n Xn matrices, with P invertible. Prove that if B, converges to B, then 
P-'B,P converges to P-'!BP. 


Miscellaneous Problems 


M.1. 
M.2. 
M.3. 


M.4. 


M.5. 


Determine the group O,(Z) of orthogonal matrices with integer entries. 
Prove the Cayley-Hamilton Theorem using Jordan form. 


Let A be an nXn complex matrix. Prove that if trace AX = 0 for all k > 0, then A is 
nilpotent. 


Let A be a complex n Xn matrix all of whose eigenvalues have absolute value less than 1. 
Prove that the series 7+ A + A? +--- converges to (J — A)~!. 


The Fibonacci numbers 0, 1, 1, 2, 3, 5, 8,..., are defined by the recursive relations 
fn = fn-1 + fn-2, With the initial conditions fo = 0, f; = 1. This recursive relation can be 


; ; ; 0: Tilo Fi 
written in matrix form as pn =| 7" ; 
A | Sn-1 Fn 


M.6 


M.7. 


M.8. 
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n n 
(a) Prove the formula f, = : (442) _ (=) | , where a = J5. 


(b) Suppose that a sequence a, is defined by the relation a, = 4 (An-1 + Ay—2). Compute 
the limit of the sequence a,, in terms of do, a}. 


(an integral operator) The space C of continuous functions f(u) on the interval [0, 1] is one 
of many infinite-dimensional analogues of R”, and continuous functions A(u, v) on the 
square 0 < u, v < | are infinite-dimensional analogues of matrices. The integral 


1 
A-f=f Alu, v) f(v), dv 


is analogous to multiplication of a matrix and a vector. (To visualize this, rotate the unit 
square in the u, v-plane and the interval [0, 1] by 90° in the clockwise direction.) The 
response of a bridge to a variable load could, with suitable assumptions, be represented 
by such an integral. For this, f would represent the load along the bridge, and then A: f 
would compute the vertical deflection of the bridge caused by that load. 

This problem treats the integral as a linear operator. For the function A = u + v, 
determine the image of the operator explicitly. Determine its nonzero eigenvalues, and 
describe its kernel in terms of the vanishing of some integrals. Do the same for the function 
Az=u?+v’. 

Let A be a 2X2 complex matrix with distinct eigenvalues, and let X be an indeterminate 
2X2 matrix. How many solutions to the matrix equation X 2 = A can there be? 


Find a geometric way to determine the axis of rotation for the composition of two three- 
dimensional rotations. 


Go AP T ERR 


Symmetry 


L’algébre n’est qu‘une géométrie écrite; 
la géométrie n’est qu’une algébre figurée. 


—Sophie Germain 


Symmetry provides some of the most appealing applications of groups. Groups were 
invented to analyze symmetries of certain algebraic structures, field extensions (Chapter 16), 
and because symmetry is a common phenomenon, it is one of the two main ways in which 
group theory is applied. The other is through group representations, which are discussed in 
Chapter 10. The symmetries of plane figures, which we study in the first sections, provide a 
rich source of examples and a background for the general concept of a group operation that 
is introduced in Section 6.7. 

We allow free use of geometric reasoning. Carrying the arguments back to the axioms 
of geometry will be left for another occasion. 


6.1 SYMMETRY OF PLANE FIGURES 


Symmetries of plane figures are usually classified into the types shown below: 


KO 


(6.1.1) Bilateral Symmetry. 


wR 


(G22) Rotational Symmetry. 
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GOL 


(6.1.3) Translational Symmetry. 


Figures such as these are supposed to extend indefinitely in both directions. There is also a 
fourth type of symmetry, though its name, glide symmetry, may be less familiar: 


>. 


(6.1.4) Glide Symmetry. 


Figures such as the wallpaper pattern shown below may have two independent translational 
symmetries, 


(6.1.5) 


and other combinations of symmetries may occur. The star has bilateral as well as rotational 
symmetry. In the figure below, translational and rotational symmetry are combined: 


Aumay troy hey 


(6.1.6) 


Another example: 


(6.1.7) 
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A rigid motion of the plane is called an isometry, and if an isometry carries a subset 
F of the plane to itself, it is called a symmetry of F. The set of all symmetries of F forms a 
subgroup of the group of all isometries of the plane: If m and m’ carry F to F, then so does 
the composed map mm’, and so on. This is the group of symmetries of F. 

Figure 6.1.3 has infinite cyclic groups of symmetry that are generated by the translation 
t that carries the figure one unit to the left. 


G= | SS, 
Figure 6.1.7 has symmetries in addition to translations. 


6.2 ISOMETRIES 


The distance between points of R” is the length |u — vj of the vector u — v. An isometry of 
n-dimensional space R” is a distance-preserving map f from R” to itself, a map such that, 
for all uw and v in R”, 


(6.2.1) |f(4) — f(v)| = |u - oI. 
An isometry will map a figure to a congruent figure. 


Examples 6.2.2 


(a) Orthogonal linear operators are isometries. 


Because an orthogonal operator ¢g is linear, p(u) — p(v) = g(u —v), So |g(u) — P(v)| = 
ly(u — v)|, and because ¢ is orthogonal, it preserves dot products and therefore lengths, 
so |g(u — v)| = ju — v|. 

(b) Translation tg by a vector a, the map defined by fg(x) = x +a, is an isometry. 


Translations are not linear operators because they don’t send 0 to 0, except of course 
for translation by the zero vector, which is the identity map. 


(c) The composition of isometries is an isometry. @ 


Theorem 6.2.3 The following conditions on a map g:R” — R” are equivalent: 


(a) @ is an isometry that fixes the origin: (0) = 0, 
(b) gy preserves dot products: (g(v) - g(w)) = (v- w), for all v and w, 
(c) gis an orthogonal linear operator. 


We have seen that (c) implies (a). The neat proof of the implication (b) => (c) that we 
present next was found a few years ago by Sharon Hollander, when she was a student in an 
MIT algebra class. 


Lemma 6.2.4 Let x and y be points of R”. If the three dot products (x - x), (x - y), and 
(y- y) are equal, then x = y. 
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Proof. Suppose that (x - x) = (x- y) = (y- y). Then 
(&%— y)-@— y)) = (&- x) -2- y) + (y- y) =0. 
The length of x — y is zero, and therefore x = y. | O 


Proof of Theorem 6.2.3, (b) > (c): Let g be a map that preserves dot product. Then it will 
be orthogonal, provided that it is a linear operator (5.1.12). To prove that g is a linear 
operator, we must show that g(u + v) = g(u) + g(v) and that y(cv) = cg(v), for all u and 
v and all scalars c. 

Given x in R”, we’ll use the symbol x’ to stand for g(x). We also introduce the symbol 
w for the sum, writing w = u + v. Then the relation g(u + v) = g(u) + ¢(v) that is to be 
shown becomes w’ = uv’ + v’. 

We substitute x = w’ and y = u’ + v’ into Lemma 6.2.4. To show that w’ = u’ + v’, it 
suffices to show that the three dot products 


(w’-w'), (w’-(u’+v’)), and (u'+v)-(u'+V’)) 
are equal. We expand the second and third dot products. It suffices to show that 
(ww) = (w’-u)+(w'-v) = (u’-w’) +2 -v)4+(0'- Vv). 


By hypothesis, g preserves dot products. So we may drop the primes: (w’ - w’) = (w- w), 
etc. Then it suffices to show that 


(6.2.5) (w-w) = (w-u)+(w-v) = (u-u)+2(u-v)4+(v-v). 


Now whereas w’ = u’ + v’ is to be shown, w = u + v is true by definition. So we may 
substitute u + v for w. Then (6.2.5) becomes true. : 

To prove that p(cv) = cy(v), we write u = cv, and we must show that u’ = cv’. The 
proof is analogous to the one we have just given. O 


Proof of Theorem 6.2.3, (a) => (b): Let g be an isometry that fixes the origin. With the 
prime notation, the distance-preserving property of y reads 


(6.2.6) (u' —v')-(u’—v’)) =((u—v)-(u—v)), 


for all u and v in R”. We substitute v = 0. Since 0! = 0, (u’- u’) = (u-u). Similarly, 
(v' - v’) = (v- v). Now (b) follows when we expand (6.2.6) and cancel (u - u) and (v- v) 
from the two sides of the equation. O 


Corollary 6.2.7 Every isometry f of R” is the composition of an orthogonal linear operator 
and a translation. More precisely, if f is an isometry and if f(0) = a, then f = tap, where 
tq is a translation and is an orthogonal linear operator. This expression for f is unique. 


Proof, Let f be an isometry, let a = f(0), and let gp = taf. Then tag a jf. The corollary 
amounts to the assertion that ¢ is an orthogonal linear operator. Since ¢ is the composition 
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of the isometries t-g and f, it is an isometry. Also, g(0) = t-a f(0) = t-a(a) = 0, so ¢ fixes 
the origin. Theorem 6.2.3 shows that g is an orthogonal linear operator. The expression 
f = tap is unique because, since g(0) = 0, we must have a = f(0), and then yg = L-g | 


To work with the expressions tay for isometries, we need to determine the product 
(the composition) of two such expressions. We know that the composition gw of orthogonal 
operators is an orthogonal operator. The other rules are: 


(6.2.8) tatp =taip and @tg=tyy, where a’ = ¢(a). 
We verify the last relation: gtg(x) = g(x + a) = 9(x) + G(@) = G(x) +a’ = ty G(x). 


Corollary 6.2.9 The set of all isometries of R” forms a group that we denote by M,, with 
composition of functions as its law of composition. 


Proof. The composition of isometries is an isometry, and the inverse of an isometry is an 
isometry too, because orthogonal operators and translations are invertible, and if f = tag, 
then f-! = g !4,! = w@ 'ta. This is a composition of isometries. O 


Note: It isn’t very easy to verify, directly from the definition, that an isometry is invertible. 
The Homomorphism ™,, — O,, 


There is an important map 7: M, — Oy, defined by dropping the translation part of an 
isometry f. We write f (uniquely) in the form f = tag, and define z(f) = ¢. 


Proposition 6.2.10 The map z is a surjective homomorphism. Its kernel is the set T = {ty} 
of translations, which is a normal subgroup of Mp. 


Proof. It is obvious that zr is surjective, and once we show that zr is a homomorphism, it 
will be obvious that T is its kernel, hence that J is a normal subgroup. We must show that 
if f and g are isometries, then z( fg) = m(f)7(g). Say that f = tag and g = thy, so that 
w(f) = gand r(g) = w. Then tp = ty g, where b’ = g(b) and fg = tagtpw = tary ow. 
So m( fg) = gy = 1( f)n(g). | 


Change of Coordinates 


Let P denote an n-dimensional space. The formula tag for an isometry depends on our 
choice of coordinates, so let’s ask how the formula changes when coordinates are changed. 
We will allow changes by orthogonal matrices and also shifts of the origin by translations. In 
other words, we may change coordinates by any isometry. 

To analyze the effect of such a change, we begin with an isometry f, a point p of P, and 
its image g = f(p), without reference to coordinates. When we introduce our coordinate 
system, the space P becomes identified with R”, and the points p and qg have coordinates, 
say X = (Xj,...,Xn)' and y= (y1,..., yn)’. Also, the isometry f will have a formula tg 
in terms of the coordinates; let’s call that formula m. The equation g = f(p) translates to 
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y = m(x) (= tay(x)). We want to determine what happens to the coordinate vectors and to 
the formula, when we change coordinates. The analogous computation for change of basis 
in a linear operator gives the clue: m will be changed by conjugation. 

Our change in coordinates will be given by some isometry, let’s denote it by 7 (eta). 
Let the new coordinate vectors of p and q be x’ and y’. The new formula m’ for f is the one 
such that m’(x’) = y’. We also have the formula 7(x’) = x analogous to the change of basis 
formula PX’ = X (3.5.11). 

We substitute (x) = x and n(y’) = y into the equation m(x) = y, obtaining mn(x’) 
= ()’), or 7 !mn(x’) = y’. The new formula is the conjugate, as expected: 


(6.2.11) | m' =n ‘mn. 


Corollary 6.2.12 The homomorphism 2: M, — O, (6.2.10) does not change when the 
origin is shifted by a translation. 


When the origin is shifted by a translation t, = 7, (6.2.11) reads m’ = t_ymty. Since 
translations are in the kernel of z and since z is a homomorphism, z(m’) = 7t(m). 0 


Orientation 


The determinant of an orthogonal operator g on R” is +1. The operator is said to be 
orientation-preserving if its determinant is 1 and orientation-reversing if its determinant is 
-1. Similarly, an orientation-preserving (or orientation-reversing) isometry f is one such 
that, when it is written in the form f = tag, the operator ¢ is orientation-preserving (or 
orientation-reversing). An isometry of the plane is orientation-reversing if it interchanges 
front and back of the plane, and orientation-preserving if it maps the front to the front. 

The map 


(6.2.13) o:M, > {+1} 


that sends an orientation-preserving isometry to 1 and an orientation-reversing isometry to 
~1 is a group homomorphism. | 


6.3. ISOMETRIES OF THE PLANE 


In this section we describe isometries of the plane, both algebraically and geometrically. 

We denote the group of isometries of the plane by M. To compute in this group, we 
choose some special isometries as generators, and we obtain relations among them. The 
relations are somewhat analogous to those that define the symmetric group $3, but because 
M is infinite, there are more of them. 

We choose a coordinate system and use it to identify the plane P with the space R. 
Then we choose as generators the translations, the rotations about the origin, and the re- 
flection about the e)-axis. We denote the rotation through the angle 8 by pg, and the 
reflection about the e,-axis by r. These are linear operators whose matrices R and So were 
exhibited before (see (5.1.17) and (5.1.16)). 
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(6.3.1) 
. x1 a 
1. translation tg by a vector a: tg(x) =x+a= | 4 | | 


cos@ -sin@ Ey 


2. rotation pg by an angle 6 about the origin: pg(x) = sin@ cos0|| xo 


3. reflection r about the e;-axis: r(x) = ki Ky [a 


We haven’t listed all of the isometries. Rotations about a point other than the origin 
aren’t included, nor are reflections about other lines, or glides. However, every element of 
M is a product of these isometries, so they generate the group. 


Theorem 6.3.2 Let m be an isometry of the plane. Then m = ty, 0g, or else m = typer, for 
a uniquely determined vector v and angle 9, possibly zero. 


Proof. Corollary 6.2.7 asserts that any isometry m is written uniquely in the form m = ty@ 
where ¢ is an orthogonal operator. And the orthogonal linear operators on R? are the 
rotations pg about the origin and the reflections about lines through the origin. The 
reflections have the form pgr (see (5.1.17)). Cl 


An isometry of the form ¢,9 preserves orientation while ty pgr reverses orientation. 
Computation in M can be done with the symbols ¢,,g, and r, using the following rules 
for composing them. The rules can be verified using Formulas 6.3.1 (see also (6.2.8)). 


Poty =type, where v' = pe(v), 
(6.3.3) fy=tyrp owhere an); 
ro = p-or. 
tutw =tviw, PePn = Porn, and rr=1. 


The next theorem describes the isometries of the plane geometrically. 


Theorem 6.3.4 Every isometry of the plane has one of the following forms: 
(a) orientation-preserving isometries: 


(i) translation: a map ty that sends p~» p+ v. 
(ii) rotation: rotation of the plane through a nonzero angle 6 about some point. 
(b) orientation-reversing isometries: 


(i) reflection: a bilateral symmetry about a line @. 


(ii) glide reflection (or glide for short): reflection about a line £, followed by translation 
by a nonzero vector parallel to £. 


The proof of this remarkable theorem is below. One of its consequences is that the 
composition of rotations about two different points is a rotation about a third point, unless it 
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is a translation. This isn’t obvious, but it follows from the theorem, because the composition 
preserves orientation. 

Some compositions are easier to visualize. The composition of rotations through 
angles @ and f about the same point is a rotation about that point, through the angle 
a + B. The composition of translations by the vectors a and b is the translation by their 
sum a+ b. 

The composition of reflections about nonparallel lines £1, £2 is a rotation about the 
intersection point p = £; £2. This also follows from the theorem, because the composition 
is orientation-preserving, and it fixes p. The composition of reflections about parallel lines 
is a translation by a vector orthogonal to the lines. 


Proof of Theorem (6.3.4). We consider orientation-preserving isometries first. Let f be an 
isometry that preserves orientation but is not a translation. We must prove that f is a 
rotation about some point. We choose coordinates to write the formula for f as m = tag 
as in (6.3.3). Since m is not a translation, 040. 


Lemma 6.3.5 An isometry f that has the form m = tgp, with 040, is a rotation through 
the angle 0 about a point in the plane. 


Proof. To simplify notation, we denote pg by p. To show that f represents a rotation with 
angle @ about some point p, we change coordinates by a translation ty. We hope to choose 
p so that the new formula for the isometry f becomes m' = p. If so, then f will be rotation 
with angle 0 about the point p. 

The rule for change of coordinates is tp (x’) = x, and therefore the new formula for fis 
m' = t,'mtp = t-ptaptp (6.2.11). We use the rules (6.3.3): ptp = typ, where p’ = p(p). 
Then if b = —p+a+ p!’ =a+(p) — p, we will have m’ = typ. We wish to choose p such 
that b = 0. 

Let J denote the identity operator, and let c = cos@ and s = sin@. The matrix of the 
linear operator J — pis 


(6.3.6) i | , 


Its determinant is 2 — 2c = 2 — 2cos@. The determinant isn’t zero unless cos 6 = 1, and this 
happens only when 6 = 0. Since 940, the equation (J — p) p = ahas a unique solution for 
p. The equation can be solved explicitly when needed. ta 


The point p is the fixed point of the isometry fa, and it can be found geometrically, 
as illustrated below. The line £ passes through the origin and is perpendicular to the vector 
a. The sector with angle 6 is situated so as to be bisected by @, and the fixed point p is 
determined by inserting the vector a into the sector, as shown. 
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0 


(6.3.7) The Fixed Point of the Isometry fa (g. 


To complete the proof of Theorem 6.3.4, we show that an orientation-reversing isometry 
m = tqpor is a glide or a reflection. To do this, we change coordinates. The isometry pgr 
is a reflection about a line £9 through the origin. We may as well rotate coordinates so that 
£9 becomes the horizontal axis. In the new coordinate system, the reflection becomes our 
standard reflection r, and the translation fg remains a translation, though the coordinates of 
the vector a will have changed. Let’s use the same symbol a for this new vector. In the new 
coordinate system, the isometry becomes m = fgr. It acts as 


Mie) X1)__|] *1+a 
iG ig mie Ee a E ne 
This isometry is the glide obtained by reflection about the line @ : {x2 = San}, followed by 


translation by the vector a, e;. If aj = 0, m is a reflection. 
This completes the proof of Theorem 6.3.4. O 


Corollary 6.3.8 The glide line of the isometry ¢t,gr is parallel to the line of reflection 
of per. C) 


The isometries that fix the origin are the orthogonal linear operators, so when 
coordinates are chosen, the orthogonal group O2 becomes a subgroup of the group of 
isometries M. We may also consider the subgroup of M of isometries that fix a point of the 
plane other than the origin. The relationship of this group with the orthogonal group is given 
in the next proposition. 


Proposition 6.3.9 Assume that coordinates in the plane have been chosen, so that the ortho- 
gonal group O2 becomes the subgroup of M of isometries that fix the origin. Then the group 
of isometries that fix a point p of the plane is the conjugate subgroup tp One. 


Proof. If anisometry m fixes p, then t, mt p fixes the origin: ti, Mt yo = tmp = Pad, = 0. 
Conversely, if m fixes o, then tpm eS fixes p. 
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One can visualize the rotation about a point p this way: First translate by t_p to move p to 
the origin, then rotate about the origin, then translate back to p. 


We go back to the homomorphism 2: M -> QO) that was defined in (6.2.10). The 
discussion above shows this: 


Proposition 6.3.10 Let p be a point of the plane, and let Pe, p denote rotation through the 


angle 6 about p. Then (pg, p) = Po. Similarly, if re is reflection about a line @ or a glide 
with glide line £ that is parallel to the x-axis, then (rz) =r. i) 


Points and Vectors 


In most of this book, there is no convincing reason to distinguish a point p of the plane 
— 


P = R? from the vector that goes from the origin o to p, which is often written as op in 
calculus books. However, when working with isometries, it is best to maintain the distinction. 
So we introduce another copy of the plane, we call it V, and we think of its elements as 
translation vectors. Translation by a vector v in V acts on a point p of Past,)(p) = p+v. 
It shifts every point of the plane by v. 

Both V and P are planes. The difference between them becomes apparent only when 
we change coordinates. Suppose that we shift coordinates in P by a translation: 7 = t,. The 
rule for changing coordinates is n(p’) = p, or p’ + w = p. At the same time, an isometry 
m changes to m! = n!mn = t-wmty (6.2.11). If we apply this rule with m = fy, then 
m' = t-wlvtw = ty. The points of P get new coordinates, but the translation vectors are 
unchanged. 

On the other hand, if we change coordinates by an orthogonal operator gy, then 
g(p') = p,and if m = ty, then m’ = yg 'tyy = ty, where v' = yg |v. So gu! = v. The effect 
of change of coordinates by an orthogonal operator is the same on P as on V. 

The only difference between P and V is that the origin in P needn’t be fixed, whereas 
the zero vector is picked out as the origin in V. 

Orthogonal operators act on V, but they don’t act on P unless the origin is chosen. 


6.4 FINITE GROUPS OF ORTHOGONAL OPERATORS ON THE PLANE 


Theorem 6.4.1 Let G be a finite subgroup of the orthogonal group O2. There is an integer 
n such that G is one of the following groups: 
(a) Cy: the cyclic group of order n generated by the rotation pg, where 6 = 21 /n. 


(b) Dp: the dihedral group of order 2n generated by two elements: the rotation Og, where 
@ = 27/n, and a reflection r’ about a line £ through the origin. 


We will take a moment to describe the dihedral group D, before proving the theorem. 
This group depends on the line of reflection, but if we choose coordinates so that £ 
becomes the horizontal axis, the group will contain our standard reflection r, the one whose 


matrix is 


(6.4.2) | k E | 
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Then if we also write p for pg, the 2n elements of the group will be the n powers p! of 
p and the n products p'r. The rule for commuting ¢ and r is 


o= [alle d-[ at l= 


where c = cos@, 5 = sin 9, and 6 = 277/n. . 
To conform with a more customary notation for groups, we denote the rotation P27/n 
by x, and the reflection r by y. 


Proposition 6.4.3 The dihedral group D, has order 2n. It is generated by two elements x 
and y that satisfy the relations 


x*=1, y=1, yxex!ly. 


The elements of D,, are 


Vex, x pacaga® |: a ey, X yee oO 
Using the first two relations (6.4.3), the third one can be rewritten in various ways. It is 
equivalent to 


(6.4.4) xyxy=1, andalsoto yx =x""1y, 


When n = 3, the relations are the same as for the symmetric group $3 (2.2.6). 
Corollary 6.4.5 The dihedral group D3 and the symmetric group $3 are isomorphic. O 


For n > 3, the dihedral and symmetric groups are not isomorphic, because Dy, has order 2n, 
while S, has order n!. 


When n > 3, the elements of the dihedral group D, are the orthogonal operators that 
carry a regular n-sided polygon A to itself — the group of symmetries of A. This is easy to 
see, and it follows from the theorem: A regular n-gon is carried to itself by the rotation by 
27 /n about its center, and also by some reflections. Theorem 6.4.1 identifies the group of all 
symmetries as Dy. 


The dihedral groups D;, D2 are too small to be symmetry groups of an n-gon in the 
usual sense. Dj is the group {1,7} of two elements. So it is a cyclic group, as is C2. But 
the element r of D, is a reflection, while the element different from the identity in C> is the 
rotation with angle 7. The group D> contains the four elements {1, 0, 7, or}, where p is 
the rotation with angle z and pr is the reflection about the vertical axis. This group 
is isomorphic to the Klein four group. 

If we like, we can think of D; and D2 as groups of symmetry of the 1-gon and 2-gon: 


———— 


1-gon. 2-gon. 
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We begin the proof of Theorem 6.4.1 now. A subgroup I" of the additive group R* of 


real numbers is called discrete if there is a (small) positive real number € such that every 
nonzero element c of I has absolute value > e€. 


Lemma 6.4.6 Let I" be a discrete subgroup of R*. Then either [ = {0}, or I is the set Za of 
integer multiples of a positive real number a. 


Proof. This is very similar to the proof of Theorem 2.3.3, that a nonzero subgroup of Zt has 
the form Zn. 

If a and b are distinct elements of I’, then since TP is a group, a — b is in I’, and 
|a — b| > €. Distinct elements of I are separated by a distance at least €. Since only finitely 
many elements separated by € can fit into any bounded interval, a bounded interval contains 
finitely many elements of I. 

Suppose that I°4{0}. Then I contains a nonzero element b, and since it is a group, 
contains -b as well. So it contains a positive element, say a’. We choose the smallest positive 
element a in I’. We can do this because we only need to choose the smallest element of the 
finite subset of I" in the interval 0 < x < a’. 

We show that I" = Za. Since a is in I and [ is a group, Za C I. Let b be an element of 
I’. Then b = ra for some real number r. We take out the integer part of r, writing r = m+ 1ro 
with m an integer and 0 < ro <1. Since Tis a group, b’ = b— maisinI and b’ = roa. Then 
0 < b’ <a. Since a is the smallest positive element in I’, b’ must be zero. So b = ma, which 
is in Za. This shows that [’ C Za, and therefore that [ = Za. “ O 


Proof of Theorem (6.4.1). Let G be a finite subgroup of O2. We want to show that G is C, 
or D,. We remember that the elements of O2 are the rotations pg and the reflections pgr. 


Case 1: All elements of G are rotations. 


We must prove that G is cyclic. Let I be the set of real numbers @ such that (g is in 
G. Then I is a subgroup of the additive group R*, and it contains 277. Since G is finite, I is 
discrete. So I has the form Za. Then G consists of the rotations through integer multiples 
of the angle a. Since 27 is in I’, it is an integer multiple of a. Therefore a = 27/n for some 
integer n, and G = Cy. 


Case 2: G contains a reflection. 


We adjust our coordinates so that the standard reflection r is in G. Let H denote the 
subgroup consisting of the rotations that are elements of G. We apply what has been proved 
in Case 1 to conclude that H is the cyclic group generated by pg, for some angle 6 = Vey a 
Then the 2n products roe and gaz for 0 < k <n —1, are in G, so G contains the dihedral 
group D,. We claim that G = Dn, and to show this we take any element 2 of G » Then ¢ 
is either a rotation or a reflection. If g is a rotation, then by definition of 7, g is in H. The 
elements of H are also in Dy, so g is in Dn. If g is a reflection, we write it in the form pyr 
for some rotation py. Since r is in G, so is the product gr = pa. Therefore (Og is a power of 
Oo, and again, g isin Dy. poe Gi 
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Theorem 6.4.7 Fixed Point Theorem. Let G be a finite group of isometries of the plane. 
There is a point in the plane that is fixed by every element of G, a point p such that g(p) = p 
for all gin G. 


Proof. This is a nice geometric argument. Let s be any point in the plane, and let S be the 
set of points that are the images of s under the various isometries in G. So each element 5’ 
of S has the form s’ = g(s) for some g in G. This set is called the orbit of s for the action 
of G. The element s is in the orbit because the identity element 1 is in G, and s = 1(s). A 
typical orbit for the case that G is the group of symmetries of a regular pentagon is depicted 
below, together with the fixed point p of the operation. 

Any element of G will permute the orbit S. In other words, if s’ is in S and / is in G, 
then h(s’) is in S: Say that s’ = g(s), with g in G. Since G is a group, hg is in G. Then 
hg(s) is in S and is equal to h(s’). 


e *?P e s 
We list the elements of S arbitrarily, writing S = {s;,..., S,}. The fixed point we are 


looking for is the centroid, or center of gravity of the orbit, defined as 
(6.4.8) ; p= (sy +-+++5n), 


where the right side is computed by vector addition, using an arbitrary coordinate system in 
the plane. 


Lemma 6.4.9 Isometries carry centroids to centroids: Let S = {s,,..., 5,} be a finite set of 
points of the plane, and let p be its centroid, as defined by (6.4.8). Let m be an isometry. Let 
m(p) = p’ and m(s;) = s,. Then p’ is the centroid of the set S’ = e Ra + eae | 


The fact that the centroid of our set S is a fixed point follows. An element g of G permutes 
the orbit S. It sends S to S and therefore it sends p to p. a 


Proof of Lemma 6.4.9 This can be deduced by physical reasoning. It can be shown alge- 
braically too. To do so, it suffices to look separately at the cases m = tg and m = g, where 
¢ 1s an orthogonal operator. Any isometry is obtained from such isometries by composition. 


Case 1: m = tg is a translation. Then s’, = s; + a and p’ = p +a. Itis true that 
p= pta=1((s +a) +--+ (sn t+a)) = i (s, +++: +5),). 
Case 2:m = isa linear operator. Then 


P= 0p) = (51 +++ + 5n)) = £(G(51) +++ + (Sn) = AG, +e +54.). 
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By combining Theorems 6.4.1 and 6.4.7 one obtains a description of the symmetry 
groups of bounded figures in the plane. 


Corollary 6.4.10 Let G be a finite subgroup of the group M of isometries of the plane. 
If coordinates are chosen suitably, G becomes one of the groups Cy, or Dy, described in 
Theorem 6.4.1. ia 


6.5 DISCRETE GROUPS OF ISOMETRIES 


In this section we discuss groups of symmetries of unbounded figures such as the one depicted 
in Figure 6.1.5. What I call the kaleidoscope principle can be used to construct a figure with 
a given group of symmetries. You have probably lvoked through a kaleidoscope. One sees 
a sector at the end of the tube, whose sides are bounded by two mirrors that run the length 
of the tube and are placed at an angle 0, such as 6 = 71/6. One also sees the reflection of the 
sector in each mirror, and then one sees the reflection of the reflection, and so on. There are 
usually some bits of colored glass in the sector, whose reflections form a pattern. 

There is a group involved. In the plane at the end of the kaleidoscope tube, let £; and 
£2 be the lines that bound the sector formed by the mirrors. The group is a dihedral group, 
generated by the reflections r; about €;. The product r;r2 of these reflections preserves 
orientation and fixes the point of intersection of the two lines, so it is a rotation. Its angle of 
rotation is + 26. 

One can use the same principle with any subgroup G of M. We won’t give precise 
reasoning to show this, but the method can be made precise. We start with a random figure 
R in the plane. Every element g of our group G will move R to a new position, call it gR. 
The figure F is the union of all the figures gR. An element / of the group sends gR tohgR, 
which is also a part of F, so it sends F to itself. If R is sufficiently random, G will be the 
group of symmetries of F. As we know from the kaleidoscope, the figure F’ is often very 
attractive. The result of applying this procedure when G is the group of symmetries of a 
regular pentagon is shown below. 


ANE 


Of course many different figures have the same group of symmetry. But it is interesting and 
instructive to describe the groups. We are going to present a rough classification, which will 
be refined in the exercises. 


Some subgroups of M are too wild to have a reasonable geometry. For instance, if the 
angle @ at which the mirrors in a kaleidoscope are placed were not a rational multiple of 277, 
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there would be infinitely many distinct reflections of the sector. We need to rule this 
possibility out. 


Definition 6.5.1. A group G of isometries of the plane P is discrete if it does not contain 
arbitrarily small translations or rotations. More precisely, G is discrete if there is a positive 
real number € so that: 


(i) if an element of G is the translation by a nonzero vector a, then the length of a is at 
least €: |a| > €, and 

(ii) if an element of G is the rotation through a nonzero angle @ about some point of the 
plane, then the absolute value of 6 is at least €: |O| > €. 


Note: Since the translation vectors and the rotation angles form different sets, it might seem 
more appropriate to have separate lower bounds for them. However, in this definition we 
don’t care about the best bounds for the vectors and the angles, so we choose € small enough 
to take care of both at the same time. O 


The translations and rotations are all of the orientation-preserving isometries (6.3.4), 
and the conditions apply to all of them. We don’t impose a condition on the orientation- 
reversing isometries. If m is a glide with nonzero glide vector v, then m? is the translation 
tay. So a lower bound on the translation vectors determines a bound for the glide vectors too. 


There are three main tools for analyzing a discrete group G: 
(6.5.2) ¢ the translation group L, a subgroup of the group V of translation vectors, 


¢ the point group G, a subgroup of the orthogonal group Oo, 


e anoperation of G on L. 
The Translation Group 


The translation group L of G is the set of vectors v such that the translation ¢, is in G. 
(6.5.3) L={veV| tye G}. 


SIMEO'?, L)= Ip en i = t-y, L is a subgroup of the additive group V* of all translation 
vectors. The bound € on translations in G bounds the lengths of the vectors in L: 


(6.5.4) Every nonzero vector v in L has length |v| > e. 


¢ Asubgroup L of one of the additive groups V* or R"* that satisfies condition (6.5.4) for 
some € > 0 is called a discrete subgroup. (This is the definition made before for Rt.) 


A subgroup L is discrete if and only if the distance between distinct vectors a and b 
of L is at least €. This is true because the distance is the length of b — a, and b— ais in L 
because L is a group. If (6.5.4) holds, then |b — a| > e.. O 


Theorem 6.5.5 Every discrete subgroup L of V+ or of R*+ is one of the following: 


(a) the zero group: L = {0}. 
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(b) the set of integer multiples of a nonzero vector a: 
L=Za={ma|meZ}, or 
(c) the set of integer combinations of two linearly independent vectors a and b: 


L=Za+Zb={ma+nb|m,n eZ}. 


Groups of the third type listed above are called lattices, and the generating set (a, b) is called 
a lattice basis. 


(6.5.6) ' A Lattice 


Lemma 6.5.7 Let L be a discrete subgroup of V+ or R?*. 


(a) A bounded region of the plane contains only finitely many points of L. 
(b) If Z is not the trivial group, it contains a nonzero vector of minimal length. 


Proof. (a) Since the elements of L are separated by a distance at least €, a smail square can 
contain at most one point of L. A region of the plane is bounded if it is contained in some 
large rectangle. We can cover any rectangle by finitely many small squares, each of which 
contains at most one point of L. 


(b) We say that a vector v is a nonzero vector of minimal length of L if L contains no shorter 
nonzero vector. To show that such a vector exists, we use the hypothesis that L is not the 
trivial group. There is some nonzero vector a in L. Then the disk of radius |a| about 
the origin is a bounded region that contains a and finitely many other nonzero points of L. 
Some of those points will have minimal length. O 


Given a basis B = (u, w) of R*, we let 1(B) denote the parallelogram with vertices 
0,u, w,u + w. It consists of the linear combinations ru + sw withO<r<land0O<s <1. 
We also denote by [1’(B) the region obtained from I1(B) by deleting the two edges 
[u, u+w] and [w, u + w)]. It consists of the linear combinations ru + sw with 0 < r<1and 


Ones < 1. 


Lemma6.5.8 Let B = (u, w) bea basis of R?, and let L be the lattice of integer combinations 
of B. Every vector v in R? can be written uniquely in the form v = x + vo, with x in L and 


vo in T1’(B). 
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Proof. Since B is a basis, every vector is a linear combination ru + sw, with real coefficieints 
rand s. We take out their integer parts, writing r = m+ro ands = n+ So, with m, n integers 
and 0 < ro, so < 1. Then v = x + up, where x = mu +nv isin L and vp = rou + Sow is in 
TI’(B). There is just one way to do this. O 


Proof of Theorem 6.5.5 It is enough to consider a discrete subgroup L of R?+. The case that 
L is the zero group is included in the list. If L #{0}, there are two possibilities: 


Case 1: All vectors in L lie on a line @ through the origin. 


Then L is a subgroup of the additive group of £*+, which is isomorphic to R*. Lemma 6.4.6 
shows that Z has the form Za. 


Case 2: The elements of LZ do not lie on a line. 


In this case, L contains independent vectors a’ and b’, and then B’ = (a’, b’) is a basis of R?. 
We must show that there is a lattice basis for L. 

We first consider the line @ spanned by a’. The subgroup L 1 £ of €* is discrete, and a’ 
isn’t zero. So by what has been proved in Case 1, L 9 @ has the form Za for some vector a. 
We adjust coordinates and rescale so that a becomes the vector (1, 0)’. 

Next, we replace b’ = (bj, b4)' by -b’ if necessary, so that b, becomes positive. We 
look for a vector b = (bj, bz)! in L-with b2 positive and otherwise as small as possible. A 
priori, we have infinitely many elements to inspect. However, since b’ is in L, we only need 
to inspect the elements b such that 0 < bz < b,. Moreover, we may add a multiple of a to 
b, so we may also assume that 0 < b; < 1. When this is done, b will be in a bounded region 
that contains finitely many elements of L. We look through this finite set to find the required 
element b, and we show that B = (a, b) is a lattice basis for L. 

Let L = Za+ Zb. Then L C L. We must show that every element of L is in L, and 
according to Lemma 6.5.8, applied to the lattice L, it is enough to show that the only element 
of L in the region IT’(B) is the zero vector. Let c = (c1, cz)’ be a point of L in that region, 
so that 0 < cj < 1 and0 < c2 < bo. Since b2 was chosen minimal, c2 = 0, and c is on the line 
€. Then c is an integer multiple of a, and since 0 < c; < 1,c =0. O 


The Point Group 


We turn now to the second tool for analyzing a discrete group of isometries. We choose 
coordinates, and go back to the homomorphism 2: M — QO whose kernel is the group T 
of translations (6.3.10). When we restrict this homomorphism to a discrete subgroup G, we 
obtain a homomorphism 


(6.5.9) rig Gamo, 


The point group G is the image of G in the orthogonal group Op. 

It is important to make a clear distinction between elements of the group G and 
those of its point group | G. So to avoid confusion, we will put bars over symbols when they 
represent elements of G. For g in G, g will be an orthogonal operator. 

By definition, a rotation Og is in G if G contains an element of the form tg (9, and this 
is a rotation through the same angle 6 about some point of the plane (6.3.5). The inverse 
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image in G of an element fg of G consists of the elements of G that are rotations through 
the angle 6 about various points of the plane. 

Similarly, let € denote the line of reflection of pgr. As we have noted before, its angle 
with the e;-axis is 40 (5.1.17). The point group G contains jgr if there is an element f, Per in 
G, and taper is a reflection or a glide reflection along a line parallel to £ (6.3.8). The inverse 
image of /@r consists of all of the elements of G that are reflections or glides along lines 
parallel to 2.'To sum up: 


¢ The point group G records the angles of rotation and the slopes of the glide lines and the 
lines of reflection, of elements of G. 


Proposition 6.5.10 A discrete subgroup G of QO) is finite, and is therefore either cyclic or 
dihedral. 


Proof. Since G contains no small rotations, the set of real numbers @ such that pg is in G 
is a discrete subgroup of the additive group R* that contains 27. Lemma 6.4.6 tells us that 
I has the form Z@, where 0 = 27:/n for some integer n. At this point, the proof of Theorem 
6.4.1 carries over. O 


The Crystallographic Restriction 


If the translation group of a discrete group of isometries G is the trivial group, the restriction 
of 7 to G will be injective. In this case G will be isomorphic to its point group G, and will 
be cyclic or dihedral. The next proposition is our third tool for analyzing infinite discrete 
groups. It relates the point group to the translation group. 

Unless an origin is chosen, the orthogonal group O2 doesn’t operate on the plane P. 
But it does operate on the space V of translation vectors. 


Proposition 6.5.11 Let G be a discrete subgroup of M. Let a be an element of its translation 
group L, and let g be an element of its point group G. Then g(@) is in L. 


We can restate this proposition by saying that the elements of G map L to itself. So G is 
contained in the group of symmetries of L, when L is regarded as a figure in the plane V. 


Proof of Proposition 6.5.11 Let a and g be elements of L and G, respectively, let g be the 
image of g in G, and let a’ = 2(a). We will show that fg is the conjugate gtag |. This will 
show that ¢,’ is in G, and therefore that a’ is in L. We write g = tpg. Then ¢ is in O2 and 
2 =@.Soad' = G(a). Using the formulas (6.2.8), we find: 


gtag) = (tho)ta(G 't-») = tota GY ‘tb = tal. oO 


Note: It is important to understand that the group G does not operate on its translation 
group L. Indeed, it makes no sense to ask whether G operates on L, because the elements 
of G are isometries of the plane P, while L is a subset of V. Unless an origin ts fixed, P is not 
the same as V. If we fix the origin in P, we can identify P with V. Then the question makes 
sense. We may ask: Is there a point of P so that with that point as the origin, the elements 
of G carry L to itself? Sometimes yes, sometimes no. That depends on the group. O 
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The next theorem describes the point groups that can occur when the translation group 
L is not trivial. 


Theorem 6.5.12 Crystallographic Restriction. Let L be a discrete subgroup of V* or R2*, 
and let H C O> be a subgroup of the group of symmetries of L. Suppose that L is not the 
trivial group. Then 


(a) every rotation in H has order 1, 2, 3, 4, or 6, and 
(b) H is one of the groups C, or Dy, andn = 1, 2, 3, 4, or 6. 


In particular, rotations of order 5 are ruled out. There is no wallpaper pattern with five-fold 
rotational symmetry (‘‘Quasi-periodic’’ patterns with five-fold symmetry do exist. See, for 
example, [Senechal].) 


Proof of the Crystallographic Restriction We prove (a). Part (b) follows from (a) and from 
Theorem 6.4.1. Let p be a rotation in H with angle 0, and let a be a nonzero vector in L 
of minimal length. Since H operates on L, p(a) is also in L. Then b = p(a) — ais in L 
too, and since a has a minimal length, |b| > |a|. Looking at the figure below, one sees that 
|b| < |a| when @ < 277/6. So we must have 6 > 2717/6. It follows that the group H is discrete, 
hence finite, and that ¢ has order < 6. 


p(a) 5 


ve 


a 


The case that @ = 27/5 can be ruled out too, because for that angle, the element b’ = 
p*(a) + ais shorter than a: 


p*(a) 
0 a oO 


6.6 PLANE CRYSTALLOGRAPHIC GROUPS 


We go back to our discrete group of isometries G C M. We have seen that when L is the 
trivial group, G is cyclic or dihedral. The discrete groups G such that L is infinite cyclic 
(6.5.5)(b) are the symmetry groups of frieze patterns such as those shown in (6.1.3), (6.1.4). 
We leave the classification of those groups as an exercise. 

When L is a lattice, G is called a two-dimensional crystallographic group. These crys- 
tallographic groups are the symmetry groups of two-dimensional crystals such as graphene. 
We imagine a crystal to be infinitely large. Then the fact that the molecules are arranged 
regularly implies that they form an array having two independent translational symmetries. 
A wallpaper pattern also repeats itself in two different directions — once along the strips of 
paper because the pattern is printed using a roller, and a second time because strips of paper 
are glued to the wall side by side. The crystallographic restriction limits the possibilities and 
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allows one to classify crystallographic groups into 17 types. Representative patterns with the 
various types of symmetry are illustrated in Figure (6.6.2). 

The point group G and the translation group L do not determine the group G 
completely. Things are complicated by the fact that a reflection in G needn’t be the image 
of a reflection in G. It may be represented in G only by glides, as in the brick pattern 
that is illustrated below. This pattern (my favorite) is relatively subtle because its group of 
symmetries doesn’t contain a reflection. It has rotational symmetries with angle 7 about 
the center of each brick. All of these rotations represent the same element (, of the 
point group G. There are no nontrivial rotational symmetries with angles other than 0 
and z. The pattern also has glide symmetry along the dashed line drawn in the figure, so 
G = Dyz={1, Pwr, Px}. 


One can determine the point group of a pattern fairly easily, in two steps: One looks 
first for rotational symmetries. They are usually relatively easy to find. A rotation (g in the 
point group G is represented by a rotation with the same angle in the group G of symmetries 
of the pattern. When the rotational symmetries have been found, one will know the integer 
n such that the point group is C, or D,. Then to distinguish D, from Cy, one looks to see 


if the pattern has reflection or glide symmetry. If it does, G = Dy, andifnot,G = Cp. 


Plane Crystallographic Groups with a Fourfold Rotation in the Point Group 


As an example of the methods used to classify discrete groups of isometries, we analyze 
groups whose point groups are C4 or D4. ied 

Let G be such a group, let 6 denote the rotation with angle 77/2 in G, and let L be the 
lattice of G, the set of vectors v such that ft, is in G. 


Lemma 6.6.1 The lattice L is square. 


Proof. We choose a nonzero vector a in L of minimal length. The point group operates on 
L, so p(a) = bis in L and is orthogonal to a. We claim that (a, b) is a lattice basis for L. 
Suppose not. Then according to Lemma 6.5.8, there will be a point of L in the region 
TT’ consisting of the points ra + r2b with 0 < rj <1. Such a point w will be at a distance less 
than ja| from one of the four vertices 0, a, b, a+ b of the square. Call that vertex v. Then 
v — wis also in L, and |v — w| < |a|. This contradicts the choice of a. 0 


We choose coordinates and rescale so that a and b become the standard basis vectors 
e, and e7. Then L becomes the lattice of vectors with integer coordinates, and vat becomes 
the set of vectors (s, t)' with 0 < s <1 and 0 <r <1. This determines coordinates in the 


plane P up to a translation. 
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mae 


Woy 
MY SS 
MH ES 


IK 
esos 
Sex 
VAN | 


Pm SCAMS UA 
eda 
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NIN 


WAY 


DNUSAN 
ANININ 


a 


(6.6.2) Sample Patterns for the 17 Plane Crystallographic Groups. 
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VAN 


NING 
NAAN 
ANIX 
WA 
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The orthogonal operators on V that send L to itself form the dihedral group D4 
generated by the rotation 6 through the angle 7/2 and the standard reflection 7. Our 
assumption is that # is in G. If 7 is also in G, then G is the dihedral group D4. If not, G is 
the cyclic group C4. We describe the group G when G is C, first. Let g be an element of G 
whose image in G is the rotation #. Then g is a rotation through the angle 2/2 about some 
point p in the plane. We translate coordinates in the plane P so that the point p becomes 
the origin. In this coordinate system, G contains the rotation p = pz /2 about the origin. 


Proposition 6.6.3 Let G be a plane crystallographic group whose point group G is the cyclic 
group C4. With coordinates chosen so that L is the lattice of points with integer coordinates, 
and so that 9 = (7/2 is an element of G, the group G consists of the products t,p!, with v 
in Z and0 <i <4: 

C= (ip |e Lh. 


Proof. Let G’ denote the set of elements of the form t,o’ with v in L. We must show that 
G' =G. By definition of L, ty is in G, and also pis in G. So typ’ is in G, and therefore G’ 
is a subset of G. 

____ To prove the opposite inclusion, let g be any element of G. Since the point group 
G is C4, every element of G preserves orientation. So g has the form g = t,0q for some 
translation vector u and some angle a. The image of this element in the point group is py, 
so a is a multiple of 7/2, and pg = p' for some i. Since p isin G, go = t, isin G and u is 
in L. Therefore g is in G’. . O 


We now consider the case that the point group G is Dy. 


Proposition 6.6.4 Let G be a plane crystallographic group whose point group G is the 
dihedral group D4. Let coordinates be chosen so that L is the lattice of points with integer 
coordinates and so that = (7/2 is an element of G. Also, let c denote the vector ( 5, 5) 
There are two possibilities: 


(a) The elements of G are the products t,g where v is in L and @is in Dg, 
Ga {ty p'|v e Ll} vu {typir|v eL}, or 


(b) the elements of G are products t,g, with g in D4. If ¢ is a rotation, then x is in L, and 
if gy is a reflection, then x is in the coset c + L: 


G = {typ'lue L} U {t,p'rluect+L}, 


Proof. Let H be the subset of orientation-preserving isometries in G. This is a subgroup 
of G whose lattice of translations is L, and which contains p. So its point group is C4. 
Proposition 6.6.3 tells us that H consists of the elements ty po’, with vin L. 

The point group also contains the reflection 7. We choose an element g in G such that 
2 =F. It will have the form g = t,r for some vector u, but we don’t know whether or not u 
is in L. Analyzing this case will require a bit of fiddling. Say that u = (p, a). 

We can multiply g on the left by a translation 7, in G (ie., vin L), to move u into the 
region I’ of points with 0 < p, q < 1. Let’s suppose this has been done. 
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We compute with g = t,7, using the formulas (6.3.3): 
ji = 2 a 
& = tyrtyr =tysry and (gp) = (turp)* = tutrpu- 


These are elements of G,sou +ru = (2p,0)', andu+rpu = (p-—4q,q- p)' are in the 
lattice L. They are vectors with integer coordinates. Since 0 < p, q < 1 and 2pis an integer, 
p is either 0 or 5. Since p — q is also an integer, g = 0 if p = 0 and gq = 5 ips i. So 
there are only two possibilities for u: Either u = (0, 0)', oru=c= G, sy In the first case, 
g =r,so G contains a reflection. This is case (a) of the proposition. The second possibility is 
case (b). 


6.7 ABSTRACT SYMMETRY: GROUP OPERATIONS 


The concept of symmetry can be applied to things other than geometric figures. Complex 
conjugation (a+ bi) ~» (a—bi), for instance, may be thought of as asymmetry of the complex 
numbers. Since complex conjugation is compatible with addition and multiplication, it is 
called an automorphism of the field C. Geometrically, it is the bilateral symmetry of the 
complex plane about the real axis, but the statement that it is an automorphism refers to its 
algebraic structure. The field F = Q[./2] whose elements are the real numbers of the form 
a+b,/2, with a and b rational, also has an automorphism, one that sends a+ bJ/2 ~~ a—bJ/2. 
This isn’t a geometric symmetry. Another example of abstract “‘bilateral’’ symmetry is given 
by acyclic group A of order 3. It has an automorphism that interchanges the two elements 
different from the identity. 

The set of automorphisms of an algebraic structure X, such as a group or a field, forms 
a group, the law of composition being composition of maps. Each automorphism should be 
thought of as a symmetry of X, in the sense that it is a permutation of the elements of X that 
is compatible with its algebraic structure. But the structure in this case is algebraic instead of 
geometric. 

So the words “‘automorphism”’ and ‘“‘symmetry”’ are more or less synonymous, except 
that ‘‘automorphism”’ is used to describe a permutation of a set that preserves an algebraic 
structure, while “symmetry” often, though not always, refers to a permutation that preserves 
a geometric structure. 

Both automorphisms and symmetries are special cases of the more general concept of 
a group operation. An operation of a group G on aset S is a rule for combining an element 
g of G and an element s of S to get another element of S. In other words, it is a map 
Gx S -» S. For the moment we denote the result of applying this law to elements g and s 
by gxs. An operation is required to satisfy the following axioms: 


6.7.1 


(a) 1*s = s for all sin S. (Here 1 is the identity of G.) 
(b) associative law: (gg')*s = gx(g’*s), for all g and g’ in G andallsin S. 


We usually omit the asterisk, and write the operation multiplicatively, as g, s~» gs. With 
multiplicative notation, the axioms are 1s = s and (gg’)s = g(g’s). 
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Examples of sets on which a group operates can be found manywhere,! and most often, 
it will be clear that the axioms for an operation hold. The group M of isometries of the plane 
operates on the set of points of the plane. It also operates on the set of lines in the plane and 
F ie set . triangles in the plane. The symmetric group S, operates on the set of indices 

a... pia 

The reason that such a law is called an operation is this: If we fix an element g of G 
but let s vary in S, then eft multiplication by g (or operation of g) defines a map from S to 
itself. We denote this map, which describes the way the element g operates, by m g: 


(6.7.2) mg:S > S 


is the map defined by mg(s) = gs. It is a permutation of S, a bijective map, because it has 


the inverse function m ,-1: multiplication by g™'. 


¢ Given an operation of a group G ona set S, an element s of S will be sent to various other 
elements by the group operation. We collect together those elements, obtaining a subset 
called the orbit Oy, of s: 


(6.7.3) O; = {s’ € S| s’ = gs for some g in G}. 


When the group M of isometries of the plane operates on the set S of triangles in the 
plane, the orbit O, of a given triangle A is the set of all triangles congruent to A. Another 
orbit was introduced when we proved the existence of a fixed point for the operation of a 
finite group on the plane (6.4.7). 

The orbits for a group action are equivalence classes for the equivalence relation 


(6.7.4) s~s’ if s’ = gs, forsome ginG. 


So if s~ s’, that is, if s’ = gs for some g in G, then the orbits of s and of s’ are the same. 
Since they are are equivalence classes: 


(6.7.5) The orbits partition the set S. 


The group operates independently on each orbit. For example, the set of triangles of 
the plane is partitioned into congruence classes, and an isometry permutes each congruence 
class separately. 

If S consists of just one orbit, the operation of G is called transitive. This means 
that every element of S is carried to every other one by some element of the group. The 
symmetric group S, operates transitively on the set of indices {1, ...,m}. The group M of 
isometries of the plane operates transitively on the set of points of the plane, and it operates 
transitively on the set of lines. It does not operate transitively on the set of triangles. 


e The stabilizer of an element s of S is the set of group elements that leave s fixed. It is a 
subgroup of G that we often denote by Gs: 


(6.7.6) G,={geG| gs=s}. 


1While writing a book, the mathematician Masayoshi Nagata decided that the English language needed this 
word; then he actually found it in a dictionary. 
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For instance, in the operation of the group M on the set of points of the plane, the stabilizer 
of the origin is isomorphic to the group O2 of orthogonal operators. The stabilizer of the 
index n for the operation of the symmetric group S, is isomorphic to the subgroup Sp-1 
of permutations of {1,...,m-1}. Or, if S is the set of triangles in the plane, the stabilizer 
of a particular equilateral triangle A is its group of symmetries, a subgroup of M that is 
isomorphic to the dihedral group D3. 


Note: It is important to be clear about the following distinction: When we say that an isometry 
m stabilizes a triangle A, we don’t mean that m fixes the points of A. The only isometry that 
fixes every point of a triangle is the identity. We mean that in permuting the set of triangles, 
m carries A to itself. O 


Just as the kernel K of a group homomorphism g:G — G’ tells us when two elements 
x and y of G have the same image, namely, if x"! y is in K, the stabilizer Gs of an element s 
of S tells us when two elements x and y of G act in the same way on s. 


Proposition 6.7.7 Let S be a set on which a group G operates, let s be an element of S, and 
let H be the stabilizer of s. 


(a) If a and b are elements of G, then as = bs if and only if a 'bisin H, and this is true if 
and only if b is in the coset aH. 


(b) Suppose that as = s’. The stabilizer H’ of s’ is a conjugate subgroup: 
H' =aHa'={geG | g = aha" for some h in H}. 
Proof. (a) as = bs if and only if s = a~'bs. 


(b) If g isin aHa"', say g = aha! withh in H, then gs’ = (aha"')(as) = ahs =as=s', 
so g stabilizes s’. This shows that aHa™! C H’. Since s = a™'s’, we can reverse the roles 
of s and s’, to conclude that a! H’a C H, which implies that H’ C aHa™!. Therefore 
H' =aHa"'. O 


Note: Part (b) of the proposition explains a phenomenon that we have seen several times 
before: When as = s’,a group element g fixes s if and only if aga”! fixes s’. 


6.8 THE OPERATION ON COSETS 


Let H be a subgroup of a group G. As we know, the left cosets aH partition G. We often 
denote the set of left cosets of H in G by G/H, copying this from the notation used for 
quotient groups when the subgroup is normal (2.12.1), and we use the bracket notation [C] 
for a coset C, when it is considered as an element of the set G/H. 

The set of cosets G/H is not a group unless H is a normal subgroup. However, 


e The group G operates on G/H in a natural way. 


Tite operation is quite obvious: If g is an element of the group, and C is a coset, then 
g[C] is defined to be the coset [gC], where gC = {gc | c € C}. Thus if [C] = [aH], then 
g[C] = [gaH]. The next proposition is elementary. 
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Proposition 6.8.1 Let H be a subgroup of a group G. 


(a) The operation of G on the set G/H of cosets is transitive. 
(b) The stabilizer of the coset [H] is the subgroup H. O 


Note the distinction once more: Multiplication by an element h of H does not act trivially 
on the elements of the coset H, but it sends the coset [ H] to itself. 

Please work carefully through the next example. Let G be the symmetric group $3 
with its usual presentation, and let H be the cyclic subgroup {1, y}. Its left cosets are 


(6.8.2) Ci =H={l,y}, Co=xH={x,xy}, C3=2x°H =e, x’y} 


(see (2.8.4)), and G operates on the set of cosets G/H = {{Ci], [C2], [C3] }. The elements 
x and y operate in the same way as on the set of indices {1, 2, 3}: 


(6.8.3) mx < (123) and my < (23). 


For instance, yC2 = {yx, yxy} = {x”y, x”} = C3. 
The next proposition, sometimes called the orbit-stabilizer theorem, shows how an 
arbitrary group operation can be described in terms of operations on cosets. 


Proposition 6.8.4 Let S be a set on which a group G operates, and let s be an element 
of S. Let H and Oy be the stabilizer and orbit of s, respectively. There is a bijective map 
€:G/H-Osy defined by [aH]~» as. This map is compatible with the operations of the 
group: €(g[C]) = ge([C]) for every coset C and every element g in G. 


For example, the dihedral group Ds operates on the vertices of a regular pentagon. 
Let Y denote the set of vertices, and let H be the stabilizer of a particular vertex. There is 
a bijective map D;/H — YV. In the operation of the group M of isometries of the plane P, 
the orbit of a point is the set of all points of P. The stabilizer of the origin is the group O2 of 
orthogonal operators, and there is a bijective map M/O 2 — P. Similarly, if H denotes the 
stabilizer of a line and if £ denotes the set of all lines in the plane, there is a bijective map 
M/H > CL. 


Proof of Proposition (6.8.4). It is clear that the map € defined in the statement of the 
proposition will be compatible with the operation of the group, if it exists. Symbolically, € 
simply replaces H by the symbol s. What is not so clear is that the rule [gH] ~~ gs defines a 
map at all. Since many symbols gH represent the same coset, we must show that if a and b 
are group elements, and if the cosets aH and bH are equal, then as and bs are equal too. 
Suppose that aH = bH. Then a ‘bis in H (2.8.5). Since H is the stabilizer of s, a'‘bs=s, 
and therefore as = bs. Our definition is legitimate, and reading this reasoning backward, 
we also see that € is an injective map. Since € carries [gH] to gs, which can be an arbitrary 
element of Oy, € is surjective as well as injective. O 


Note: The reasoning that we made to define the map € occurs frequently. Suppose that a set 
'S is presented as the set of equivalence classes of an equivalence relation on a set S, and let 
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:S — S be the map that sends an element s to its equivalence class s. A common way to 
detine a map € from S to another set T 1s this: Given ¥ in S. one chooses an element sin S 
such that x = 5, and defines €(.x) in terms of s. Then one must show, as we did above, that 


the definition doesn’t depend on the choice of the element s whose equivalence class is x, 


but only on v. This process is referred to as showing that the map is well defined. ’ 


6.9 THE COUNTING FORMULA 


Let H be a subgroup of a finite group G. As we know, all cosets of H in G have the same 
number of elements. and with the notation G. H tor the set of cosets, the order |G» Hj 1s 
what is called the index [G: H] of Hin G. The Counting Formula 2.8.8 becomes 


(6.9.1) IG| = |A|IG/ A. 
There is a similar formula for an orbit of any group operation: 


Proposition 6.9.2 Counting Formula. Let S be a finite set on which a group G operates, and 
let G,; and Oy, be the stabilizer and orbit of an element s of S. Then 


IG] =|Gs]|Os|, or 
(order of G) = (order of stabilizer)-(order of orbit). 


This follows from (6.9.1) and Proposition (6.8.4). O 


Thus the order of the orbit 1s equal to the index of the stabilizer. 
(6.9.3) [Os| = [GG sl. 


and it divides the order of the group. There is one such formula for every element s of S. 


Another formula uses the partition of the set S into orbits to count its elements. We 
number the orbits that make up S arbitrarily. as O,..... Ox. Then 


(6.9.4) {| S].= [O;| + |O2| +---+ | Ox. 
Formulas 6.9.2 and 6.9.4 have many applications. 


Examples 6.9.5 (a) The group G of rotational symmetries of a regular dodecahedron 
operates transitively on the set F of its faces. The stabilizer G ; of a particular face f 
is the group of rotations by multiples of 2.7/5 about the center of f: its order is 5. The 
dodecahedron has 12 faces. Formula 6.9.2 reads 60 = 5 - 12. so the order of G is 60. Or, G 
operates transitively on the set V of vertices. The stabilizer G, of a vertex v is the group of 
order 3 of rotations by multiples of 27/3 about that vertex. A dodecahedron has 20 vertices, 
so 60 = 3-20, which checks. There is a similar computation for edges: G operates transitively 
on the set of edges, and the stabilizer of an edge e contains the identity and a rotation by 7 
about the center of e. So |G,| = 2. Since 60 = 2. 30, a dodecahedron has 30 edges. 
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(b) We may also restrict an operation of a group G to a subgroup H. By restriction, an 
operation of G ona set S defines an operation of H on S, and this operation leads to more 
numerical relations. The H-orbit of an element s will be contained in the G-orbit of s,soa 
single G-orbit will be partitioned into H-orbits. 

For example, let F be the set of 12 faces of the dodecahedron, and let H be the 
stabilizer of a particular face f, a cyclic group of order 5S. The order of any H-orbit is 
either 1 or 5. So when we partition the set F of 12 faces into H-orbits, we must find two 
orbits of order 1. We do: H fixes f and it fixes the face opposite to f. The remaining faces 
make two orbits of order 5. Formula 6.9.4 for the operation of the group H on the set 
of faces is 12 = 1+1+5-+5. Or, let K denote the stabilizer of a vertex, a cyclic group 
of order 3. We may also partition the set F into K-orbits. In this case Formula 6.9.4 is 
12=3+3+3+3. O 


6.10 OPERATIONS ON SUBSETS 


Suppose that a group G operates on a set S. If U is a subset of S of order r, 
(6.10.1) gU ={gu|ucU} 


is another subset of order r. This allows us to define an operation of G on the set of subsets 
of order r of S. The axioms for an operation are verified easily. 

For instance, let O be the octahedral group of 24 rotations of a cube, and let F 
be the set of six faces of the cube. Then O also operates on the subsets of F of order 
two, that is, on unordered pairs of faces. There are 15 pairs, and they form two orbits: 
F = {pairs of opposite faces} U {pairs of adjacent faces}. These orbits have orders 3 and 12, 
respectively. 

The stabilizer of a subset U is the set of group elements g such that [gU] = [U], which 
is to say, gU = U. The stabilizer of a pair of opposite faces has order 8. 

Note this point once more: The stabilizer of U consists of the group elements such that 
gU =U. This means that g permutes the elements within U, that is, whenever u is in U, gu 
is also in U. 


6.11 PERMUTATION REPRESENTATIONS 


In this section we analyze the various ways in which a group G can operate on a set S. 
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e A permutation representation of a group G is a homomorphism from the group to a 
symmetric group: 


(6.11.1) g:G > Sn. 


Proposition 6.11.2 Let G be a group. There is a bijective correspondence between operations 
of G on the set S = {1, ..., m} and permutation representations G > Sy): 


operations of G permutation 
— : P 
on S representations 


Proof. This is very simple, though it can be confusing when one sees it for the first time. If 
we are given an operation of G on S, we define a permutation representation @ by setting 
y(g) = mg, multiplication by g (6.7.1). The associative property g(hi) = (gh)i shows that 


mg(mpi) = g(hi) = (gh)i = meni. 


Hence ¢ is a homomorphism. Conversely, if g is a permutation representation, the same 
formula defines an operation of G on S. O 


For example, the operation of the dihedral group Dy, on the vertices (v;,..., Un) ofa 
regular n-gon defines a homomorphism ¢: Dyn —> Sn. 


Proposition 6.11.2 has nothing to do with the fact that it works with a set of indices. If 
Perm(S) is the group of permutations of an arbitrary set S, we also call a homomorphism 
g:G — Perm(S) a permutation representation of G. 


Corollary 6.11.3 Let Perm(.S) denote the group of permutations of a set S, and let G be a 
group. There is a bijective correspondence between operations of G on S and permutation 
representations g:G — Perm(S): 


operations Bs | 


of GonS G — Perm(S) oO 


A permutation representation G — Perm(S) needn’t be injective. If it happens to be 
injective, one says that the corresponding operation is faithful. To be faithful, an operation 
must have the property that m,, multiplication by g, is not the identity map unless g = 1: 


(6.11.4) An operation is faithful if it has this property: 
The only element g of G such that gs = s for every s in S is the identity. 


The operation of the group of isometries M on the set S of equilateral triangles in the plane 
is faithful, because the only isometry that carries every equilateral triangle to itself is the 
identity. 
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Permutation representations g:G —> Perm(S) are rarely surjective because the order 
of Perm(S) tends to be very large. But one case is given in the next example. 


Example 6.11.5 The group GL2(F2) of invertible matrices with mod 2 coefficients is 
isomorphic to the symmetric group S3. 


We denote the field F2 by F and the group GL2(F) by G. The space F* of column 
vectors consists of four vectors: 


e<[3} o-[i-e-[pae Ly) 


The group G operates on the set of three nonzero vectors S = {e), e2, e; + €2}, and this 
gives us a permutation representation g:G —> $3. The identity is the only matrix that fixes 
both €; and é2, so the operation of G on S is faithful, and ¢ is injective. The columns of an 
invertible matrix must be an ordered pair of distinct elements of S. There are six such pairs, 
so |G| = 6. Since $3 also has order six ¢g is an isomorphism. 0 


6.12 FINITE SUBGROUPS OF THE ROTATION GROUP 


In this section, we apply the Counting Formula to classify the finite subgroups of SO3, the 
group of rotations of R*. As happens with finite groups of isometries of the plane, all of them 
are symmetry groups of familiar figures. 


Theorem 6.12.1 A finite subgroup of SO3 is one of the following groups: 


Cx: the cyclic group of rotations by multiples of 277 /k about a line, with & arbitrary; 
Dy: the dihedral group of symmetries of a regular k-gon, with k arbitrary; 

T: the tetrahedral group of 12 rotational symmetries of a tetrahedron, 

O: the octahedral group of 24 rotational symmetries of a cube or an octahedron; 


I: the icosahedral group of 60 rotational symmetries of a dodecahedron or an icosahedron. 


ADO OGE 


Note: The dihedral groups are usually presented as groups of symmetry of a regular polygon 
in the plane, where reflections reverse orientation. However, a reflection of a plane can be 
achieved by a rotation through the angle z in three-dimensional space, and in this way the 
symmetries of a regular polygon can be realized as rotations of R?. The dihedral group Dy, 
can be generated by a rotation x with angle 277/n about the e;-axis and a rotation y with 
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angle mz about the e2-axis. With c = cos 27/n and s = sin27/n, the matrices that represent 
these rotations are 


i! ~1 
(612.2) = c -s |, and y= fl : 
=| im) 
‘Let G be a finite subgroup of SO3, of order N > 1. We’ll call a pole of an element 
g#1of Ga pole of the group. Any rotation of R? except the identity has two poles — the 


intersections of the axis of rotation with the unit sphere S*. So a pole of G is a point on the 
2-sphere that is fixed by a group element g different from 1. 


Example 6.12.3. The group T of rotational symmetries of a tetrahedron A has order 12. Its 
poles are the points of S* that lie above the centers of the faces, the vertices, and the centers 
of the edges. Since A has four faces, four vertices, and six edges, there are 14 poles. 


|poles| = 14 = |faces| + |vertices| + |edges| 


Each of the 11 elements g#1 of T has two spins — two pairs (g, p), where p is a pole of g. 
So there are 22 spins altogether. The stabilizer of a face has order 3. Its two elements #1 
share a pole above the center of a face. Similarly, there are two elements with a pole above 
a vertex, and one element with a poie above the center of an edge. 


|spins| = 22 = 2 |faces| + 2 |\vertices| + |edges| 


Let P denote the set of all poles of a finite subgroup G. We will get information about the 
group by counting these poles. As the example shows, the count can be confusing. 


Lemma 6.12.4 The set P of poles of G is a union of G-orbits. So G operates on P. 


Proof. Let p bea pole, say the pole of an element g#1 in G, let h be another element of G, 
and let g = hp. We have to show that q is a pole, meaning that q is fixed by some element 
g’ of G other than the identity. The required element is hgh™!. This element is not equal to 
1 because #1, andhgh"!q=hgp=hp=4q. oO 


The stabilizer Gp of a pole p is the group of all of the rotations about p that are in G. 
It is a cyclic group, generated by the rotation of smallest positive angle 9. We’ll denote its 
order by rp. Then 6 = 27/ryp. 
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Since p is a pole, the stabilizer G , contains an element besides 1, so r p > 1. The set of 
elements of G with pole p is the stabilizer G p» With the identity element omitted. So there 


are rp — 1 group elements that have p as pole. Every group element g except one has two 
poles. Since |G| = N, there are 2N — 2 spins. This gives us the relation 


(6.12.5) >> @p -1) = 2N - 1). 


peP 


We collect terms to simplify the left side of this equation: Let np denote the order of the 
orbit O, of p. By the Counting Formula (6.9.2), 


(6.12.6) rpNp = INE, 


If two poles p and p’ are in the same orbit, their orbits are equal, son p = np, and therefore 
rp = Tp’. We label the various orbits arbitrarily, say as O;, O2, ... Ox, and we letn; =np 
and r; = rp for p in Oj, so that njr; = N. Since the orbit O; contains n; elements, there are 
n; terms equal to r; — 1 on the left side of (6.12.5). We collect those terms together. This 
gives us the equation 


k 
>i ni(r; — 1) = 2N 2. 


I) 


We divide both sides by N to get a famous formula: 


(6.12.7) 3 (1 = -) =2— x. 


This may not look like a promising tool, but in fact it tells us a great deal. The right side is 
between 1 and 2, while each term on the left is at least 5. It follows that there can be at most 
three orbits. 

The rest of the classification is made by listing the possibilities: 


One orbit: 1 — = =2- Z. This is impossible, because 1 — + < 1, while 2 — q lb 


Two orbits: (1-2) + (1-4) =2— §, thatis, 2 + 5 = ¥. 


Because r; divides N, this equation holds only when 7; = r2 = N, and then nm; = nz = 1. 
There are two poles p; and po, both fixed by every element of the group. So G is the cyclic 
group Cy of rotations whose axis of rotation is the line € through p; and pp. 


Three orbits:(1-2+)+(-%)+(1- %) =2- As 


This is the most interesting case. Since é is positive, the formula implies that 


1 1 il 
(6.12.8) —+—4+—>1. 
Al (CD r3 


We arrange the r; in increasing order. Then r; = 2: If all r; were at least 3, the left side 
would be < 1. 
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Case 1: rr, = r2 = 2. The third order r3 = k can be arbitrary, and N = 2k: 
fo 2, 20k KK, 2) Nak, 


There is one pair of poles {p, p’} making the orbit O3. Half of the elements of G fix p, 
and the other half interchange p and p’. So the elements of G are rotations about the line 
é through p and p’, or else they are rotations by z about a line perpendicular to £. The 
group G is the group of rotations fixing a regular k-gon A, the dihedral group Dx. The 
polygon A lies in the plane perpendicular to @, and the vertices and the centers of edges 
of A correspond to the remaining poles. The bilateral symmetries of A in R* have become 
rotations through the angle z in R°. 


Case 2: rj = 2 and 2 < rz < r3. The equation 1/2 + 1/4 + 1/4 = 1 rules out the possibility 
that 72 > 4. Therefore r2 = 3. Then the equation 1/2 + 1/3 + 1/6 = 1 rules out r3 > 6. Only 
three possibilities remain: 


(6.12.9) 


(r= Z,3,3; Web, 4.4; N See 
The poles in the orbit O3 are the vertices of a regular tetrahedron, and G is the 
tetrahedral group T of its 12 rotational symmetries. 


(ii) 7; =2,3,4; nj =12,8,6; N = 24. 
The poles in the orbit O3 are the vertices of a regular octahedron, and G is the 
octahedral group O of its 24 rotational symmetries. 


Giy7; = 253, 5; n= sr 2e, W  e: 
The poles in the orbit O3 are the vertices of a regular icosahedron, and G is the 
icosahedral group J of its 60 rotational symmetries. 


In each case, the integers n; are the numbers of edges, faces, and vertices, respectively. 

Intuitively, the poles in an orbit should be the vertices of a regular polyhedron because 
they must be evenly spaced on the sphere. However, this isn’t quite correct, because the 
centers of the edges of a cube, for example, form an orbit, but they do not span a regular 
polyhedron. The figure they span is called a truncated polyhedron. 

We'll verify the assertion of (iii). Let V be the orbit O3 of order twelve. We want to 
show that the poles in this orbit are the vertices of a regular icosahedron. Let p be one of 
the poles in V. Thinking of p as the north pole of the unit sphere gives us an equator and 
a south pole. Let H be the stabilizer of p. Since r3 = S, this is a cyclic group, generated by 
a rotation x about p with angle 27/5. When we decompose V into H-orbits, we must get 
at least two H-orbits of order 1. These are the north and south poles. The ten other poles 
making up V form two H-orbits of order 5. We write them as {qo, ... , ga} and Ae 
where g; = x'qo and q, = Ue By symmetry between the north and south poles, one of 
these H-orbits is in the northern hemisphere and one is in the southern hemisphere, or else 


both are on the equator. Let’s say thatthe orbit {q;} is in the northern hemisphere or on the 
equator. 
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Let |x, y| denote the spherical distance between points x and y on the unit sphere. We 
note that d = |p, q;| is independent of i = 0, ..., 4, because there is an element of H that 
carries qq ~» qi, while fixing p. Similarly, d’ = | p, q’,| is independent of i. So as p’ ranges over 
the orbit V the distance |p, p’| takes on only four values 0, d, a’ and z. The values d and d’ 
are taken on five times each, and 0 and z are taken on once. Since G operates transitively 
on V, we will obtain the same four values when p is replaced by any other pole in V. 

We note that d < 2/2 while d’ > m/2. Because there are five poles in the orbit {q;}, 
the spherical distance |q;, g+1| is less than 77/2, so it is equal to d, and d < 1/2. Therefore 
that orbit isn’t on the equator. The three poles p, q;, gi; form an equilateral triangle. There 
are five congruent equilateral triangles meeting at p, and therefore five congruent triangles 
meet at each pole. They form the faces of an icosahedron. 


Note: There are just five regular polyhedra. This can be proved by counting the number of 
ways that one can begin to build one by bringing congruent regular polygons together at a 
vertex. One can assemble three, four, or five equilateral triangles, three squares, or three 
regular pentagons. (Six triangles, four squares, or three hexagons glue together into flat 
surfaces.) So there are just five possibilities. But this analysis omits the interesting question 
of existence. Does an icosahedron exist? Of course, we can build one out of cardboard. But 
when we do, the triangles never fit together precisely, and we take it on faith that this is due 
to our imprecision. If we drew the analogous conclusion about the circle of fifths in music, 
we’d be wrong: the circle of fifths almost closes up, but not quite. The best way to be sure 
that the icosahedron exists may be to write down the coordinates of its vertices and check 
the distances. This is Exercise 12.7. i) 


Our discussion of the isometries of the plane has analogues for the group of isometries 
of three-space. One can define the notion of a crystallographic group, a discrete subgroup 
whose translation group is a three-dimensional lattice. The crystallographic groups are anal- 
ogous to two-dimensional lattice groups, and crystals form examples of three-dimensional 
configurations having such groups as symmetry. It can be shown that there are 230 types of 
crystallographic groups, analogous to the 17 lattice groups (6.6.2). This is too long a list to 
be useful, so crystals have been classified more crudely into seven crystal systems. For more 
about this, and for a discussion of the 32 crystallographic point groups, look in a book on 
crystallography, such as [Schwarzenbach]. 


Un bon héritage vaut mieux que le plus joli probleme 
de géométrie, parce qu’il tient lieu de méthode 
générale, et sert 4 resoudre bien des problémes. 


—Gottfried Wilhelm Leibniz? 


21 learned this quote from V.I. Arnold. L’H6pital had written to Leibniz, apologizing for a long silence, and 
saying that he had been in the country taking care of an inheritance. In his reply, Leibniz told him not to worry, and 
continued with the sentence quoted. 
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EXERCISES 
Section 1 Symmetry of Plane Figures 7 
1.1. Determine all symmetries of Figures 6.1.4, 6.1.6, and 6.1.7. 


Section 3 Isometries of the Plane 


acl, 
3.2. 
33: 


3.4. 


a5: 
3.6. 


Verify the rules (6.3.3). 
Let m be an orientation-reversing isometry. Prove algebraically that m? is a translation. 


Prove that a linear operator on R? is a reflection if and only if its eigenvalues are 1 and -1, 
and the eigenvectors with these eigenvalues are orthogonal. 


Prove that a conjugate of a glide reflection in M is a glide reflection, and that the glide 

vectors have the same length. 

Write formulas for the isometries (6.3.1) in terms of a complex variable z = x + Ly. 

(a) Let s be the rotation of the plane with angle 77/2 about the point (1, 1)’. Write the 
formula for s as a product tg po. 


(b) Let s denote reflection of the plane about the vertical axis x = 1. Find an isometry 2 
such that grg! = s, and write s in the form tg por. 


Section 4 Finite Groups of Orthogonal Operators on the Plane 


4.1. 
4.2. 


4.3. 


Write the product x2 yx"! y!x3 y? in the form x! y/ in the dihedral group Dy. 
(a) List all subgroups of the dihedral group D4, and decide which ones are normal. 


(b) List the proper normal subgroups NV of the dihedral group Djs, and identify the 
quotient groups Dj5/N. 

(c) List the subgroups of Dg that do not contain x?. 

(a) Compute the left cosets of the subgroup H = {1, x} in the dihedral group Djo. 

(b) Prove that H is normal and that Dj9/ A is isomorphic to Ds. 

(c) Is Dip isomorphic to Ds x H? 


Section 5 Discrete Groups of Isometries 


5,1. 


5.2. 


58s 
5.4. 


De. 


5.6. 


Let £; and £) be lines through the origin in R? that intersect in an angle 77/n, and let r; be 
the reflection about £;. Prove that r; and r2 generate a dihedral group D,,. 


What is the crystallographic restriction for a discrete group of isometries whose translation 
group L has the form Za with a#0? 


How many sublattices of index 3 are contained in a lattice L in R2? 


Let (a, b) be a lattice basis of a lattice L in R*. Prove that every other lattice basis has the 
form (a’, b’) = (a, b) P, where P is a2 X2 integer matrix with determinant +1. 


Prove that the group of symmetries of the frieze pattern dd<d<<<1< is isomorphic to the 
direct product Cz X Cy. of a cyclic group of order 2 and an infinite cyclic group. 


Let G be the group of symmetries of the frieze pattern a4 e4oe. Determine 
the point group G of G, and the index in G of its subgroup of translations. 
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SEA Let N denote the group of isometries of a line R!. Classify discrete subgroups of N, 
identifying those that differ in the choice of origin and unit length on the line. 


*5.8. Let N’ be the group of isometries of an infinite ribbon 
R= {(x, y)|-1<y<1}. 
It can be viewed as a subgroup of the group M. The following elements are in N’: 


ta: (x, y) > (*+4, y) 
are, Vy) > (aa) 
TEX, y) —> (x,-¥) 
re, y) > (x,y). 


(a) State and prove analogues of (6.3.3) for these isometries. 


(b) A frieze pattern is a pattern on the ribbon that is periodic and whose group of 
symmetries is discrete. Classify the corresponding symmetry groups, identifying those 
that differ in the choice of origin and unit length on the ribbon. Begin by making some 
patterns with different symmetries. Make a careful case analysis when proving your 
results. 


5.9. Let G be a discrete subgroup of M whose translation group is not trivial. Prove that 
there is a point po in the plane that is not’fixed by any element of G except the 
identity. 

5.10. Let f and g be rotations of the plane about distinct points, with arbitrary nonzero 
angles of rotation 9 and @. Prove that the group generated by f and g contains a 
translation. 


5.11. If S and S’ are subsets of R” with S C S’, then S is dense in S’ if for every element s’ of S’, 
there are elements of S arbitrarily near to s’. 
(a) Prove that a subgroup I of R* is either dense in R, or else discrete. 


(b) Prove that the subgroup of R+ generated by 1 and V2 is dense in R*. 
(c) Let H be asubgroup of the group G of angles. Prove that H is either a cyclic subgroup 
of G or else it is dense in G. 


5.12. Classify discrete subgroups of the additive group R°*. 


Section6 Plane Crystallographic Groups 


6.1. (a) Determine the point group G for each of the patterns depicted in Figure (6.6.2). 
(b) For which of the patterns can coordinates be chosen so that the group G operates on 
the lattice L? 
6.2. Let G be the group of symmetries of an equilateral triangular lattice L. Determine the 
index in G of the subgroup of translations in G. 
6.3. With each of the patterns shown, determine the point group and find a pattern with the 
same type of symmetry in Figure 6.6.2. 
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*6.4, Classify plane crystallographic groups with point group D; = (1, 7}. 


6.5. (a) Prove that if the point group of a two-dimensional crystallographic group G is C¢ or 


Dg, the translation group L is an equilateral triangular lattice. 
(b) Classify those groups. 


*6.6. Prove that symmetry groups of the figures in Figure 6.6.2 exhaust the possibilities. 


Section 7 Abstract Symmetry: Group Operations 


fae 


Tals 


Te 


7.4, 


Tee 


Let G = Dy be the dihedral group of symmetries of the square. 


(a) What is the stabilizer of a vertex? of an edge? 


(b) G operates on the set of two elements consisting of the diagonal lines. What is the 
stabilizer of a diagonal? 


The group M of isometries of the plane operates on the set of lines in the plane. Determine 
the stabilizer of a line. 


The symmetric group S3 operates on two sets U and V of order 3. Decompose the product 
set UX V into orbits for the ‘‘diagonal action” g(u, v) = (gu, gv), when 

(a) the operations on U and V are transitive, 

(b) the operation on U is transitive, the orbits for the operation on V are {v1} and {v2, v3}. 
In each of the figures in Exercise 6.3, find the points that have nontrivial stabilizers, and 
identify the stabilizers. 


Let G be the group of symmetries of a cube, including the orientation-reversing symmetries. 


Describe the elements of G geometrically. 


7.7. 


7.8. 


tO: 


7.10. 


Ube 
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Let G be the group of symmetries of an equilateral triangular prism P, including the 
orientation-reversing symmetries. Determine the stabilizer of one of the rectangular faces 
of P and the order of the group. 


Let G = GL,,(R) operate on the set V = R” by left multiplication. 


(a) Describe the decomposition of V into orbits for this operation. 
(b) What is the stabilizer of e;? 


Decompose the set C?*? of 2X2 complex matrices into orbits for the following operations 

of GL2(C): (a) left multiplication, (b) conjugation. 

(a) Let S be the set R”*” of real m Xn matrices, and let G = GL»,(R) XGL,(R). Prove 
that the rule (P, Q@) * A = PAQ™! define an operation of G on S. 

(b) Describe the decomposition of S into G-orbits. 

(c) Assume that m <n. What is the stabilizer of the matrix [J | 0]? 


(a) Describe the orbit and the stabilizer of the matrix i | under conjugation in the 
general linear group GL2(R). 


(b) Interpreting the matrix in GL2(Fs), find the order of the orbit. 


Prove that the only subgroup of order 12 of the symmetric group Sq is the alternating 
group A4. 


Section 8 The Operation on Cosets 


8.1. 
8.2. 
8.3. 


8.4. 


Does the rule P x A = PAP’ define an operation of GL, on the set of m Xn matrices? 
What is the stabilizer of the coset [aH] for the operation of G on G/H? 


Exhibit the bijective map (6.8.4) explicitly, when G is the dihedral group Dg and S is the 
set of vertices of a square. 

Let H be the stabilizer of the index 1 for the operation of the symmetric group G = S, 
on the set of indices {1, ...,m}. Describe the left cosets of H in G and the map (6.8.4) in 
this case. 


Section9 The Counting Formula 


9.1. 


9.2. 


93: 


9.4. 


Use the counting formula to determine the orders of the groups of rotational symmetries 
of a cube and of a tetrahedron. 

Let G be the group of rotational symmetries of a cube, let Gy, Ge, G be the stabilizers 
of a vertex v, an edge e, and a face f of the cube, and let V, E, F be the sets of vertices, 
edges, and faces, respectively. Determine the formulas that represent the decomposition 
of each of the three sets into orbits for each of the subgroups. 

Determine the order of the group of symmetries of a dodecahedron, when orientation- 
reversing symmetries such as reflections in planes are allowed. 

Identify the group 7” of all symmetries of a regular tetrahedron, including orientation- 
reversing symmetries. 
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oS: 


9.6. 
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Let F be a section of an /-beam, which one can think of as the product set of the letter 
I and the unit interval. Identify its group of symmetries, orientation-reversing symmetries 
included. 


Identify the group of symmetries of a baseball, taking the seam (but not the stitching) into 
account and allowing orientation-reversing symmetries. 


Section 10 Operations on Subsets 


10.1. 


10.2. 


10.3. 


Determine the orders of the orbits for left multiplication on the set of subsets of order 3 of 
D3. 

Let S be a finite set on which a group G operates transitively, and let U be a subset of S. 
Prove that the subsets gU cover S evenly, that is, that every element of S is in the same 
number of sets 2gU. 


Consider the operation of left multiplication by G on the set of its subsets. Let U be a 
subset such that the sets gU partition G. Let H be the unique subset in this orbit that 
contains 1. Prove that H is a subgroup of G. 


Section 11 Permutation Representations 


iL.1. 
11.2. 
13. 


11.4. 


Eis. 


11.6. 


11.7. 


11.8. 


11.9. 


Describe all ways in which S3 can operate on a set of four elements. 
Describe all ways in which the tetrahedral group T can operate on a set of two elements. 


Let S be a set on which a group G operates, and let H be the subset of elements g such 
that gs = s for all s in S. Prove that His a normal subgroup of G. 


Let G be the dihedral group D4 of symmetries of a square. Is the action of G on the 
vertices a faithful action? on the diagonals? 


A group G operates faithfully on a set S of five elements, and there are two orbits, one of 
order 3 and one of order 2. What are the possible groups? 


Hint: Map G to a product of symmetric groups. 


Let F = F3. There are four one-dimensional subspaces of the space of column vectors 
F?, List them. Left multiplication by an invertible matrix permutes these subspaces. Prove 
that this operation defines a homomorphism g:G L2(F) — S4. Determine the kernel and 
image of this homomorphism. 


For each of the following groups, find the smallest integer n such that the group has a 
faithful operation on a set of order n: (a) D4, (b) De, (¢) the quaternion group H. 


Find a bijective correspondence between the multiplicative group ee and the set of 
automorphisms of a cyclic group of order p. 


Three sheets of rectangular paper S, S2, S3 are made into a stack. Let G be the group of 
all symmetries of this configuration, including symmetries of the individual sheets as well 
as permutations of the set of sheets. Determine the order of G, and the kernel of the map 
G — 83 defined by the permutations of the set {S,, 55, 53}. 


Section 12 Finite Subgroups of the Rotation Group 


12.1. 


Explain why the groups of symmetries of the dodecahedron and the icosahedron are 
isomorphic. 


12.2. 
12.3. 


12.4. 


12:5. 
12.6. 
12.7. 


*12.8. 
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Describe the orbits of poles for the group of rotations of an octahedron. 


Let O be the group of rotations of a cube, and let S be the set of four diagonal lines 
connecting opposite vertices. Determine the stabilizer of one of the diagonals. 


Leth; = O be the group of rotations of a cube, and let H be the subgroup carrying one of 
the two inscribed tetrahedra to itself (see Exercise 3.4). Prove that H — T. 


Prove that the icosahedral group has a subgroup of order 10. 
Determine all subgroups of (a) the tetrahedral group, (b) the icosahedral group. 


The 12 points (£1, +a, 0)', (0, £1, ta)', (4a, 0, +1)! form the vertices of a regular 
icosahedron if a > 1 is chosen suitably. Verify this, and determine a. 


Prove the crystallographic restriction for three-dimensional crystallographic groups: 
A rotational symmetry of a crystal has order 2, 3, 4, or 6. 


Miscellaneous Problems 


*M.1. 


M.2. 


M.3. 


*M.A. 


*MLS5. 


Let G be a two-dimensional crystallographic group such that no element g41 fixes any 
point of the plane. Prove that G is generated by two translations, or else by one translation 
and one glide. 


(a) Prove that the set Aut G of automorphisms of a group G forms a group, the law of 
composition being composition of functions. 

(b) Prove that the map g:G — Aut G defined by g~~ (conjugation by g) is a homo- 
morphism, and determine its kernel. 

(c) The automorphisms that are obtained as conjugation by a group element are called 
inner automorphisms. Prove that the set of inner automorphisms, the image of ¢, is a 
normal subgroup of the group Aut G. 


Determine the groups of automorphisms (see Exercise M.2) of the group 

(a) C4, (b) Cs, (©) C2XC2, (d) Dg, (e) the quaternion group H. 

With coordinates x;,...,Xn, in R” as usual. the set of points defined by the in- 
equalities -1 < xj < +1, fori = 1;...,”, is an n-dimensional hypercube Cy. The 


1-dimensional hypercube is a line segment and the 2-dimensional hypercube is a square. 
The 4-dimensional hypercube has eight face cubes, the 3-dimensional cubes defined by 
{x; = 1} and by {x; = —1}, for i = 1, 2, 3, 4, and it has 16 vertices (+1, +1, +1, +1). 

Let G, denote the subgroup of the orthogonal group O, of elements that send the 
hypercube to itself, the group of symmetries of C,,, including the orientation-reversing sym- 
metries. Permutations of the coordinates and sign changes are among the 
elements of Gy. 


(a) Use the counting formula and induction to determine the order of the group Gy. 
(b) Describe G,, explicitly, and identify the stabilizer of the vertex (1, ..., 1). Check your 
answer by showing that G2 is isomorphic to the dihedral group D4. 


(a) Find a way to determine the area of one of the hippo heads that make up the first 
pattern in Figure 6.6.2. Do the same for one of the fleurs-de-lys in the pattern at the 
bottom of the figure. 

(b) A fundamental domain D for a plane crystallographic group isa bounded region of the 
plane such that the images gD, g in G, cover the plane exactly once, without overlap. 
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*M. 


*i¥ie7. 


M.S. 
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Find mo neneongruent fundamental demains for group of symmetries of the hippo 
pattern. Do the same for the fleur-de-lys pattern. 

(c) Prove thatit D and 1) are fundamental domains tor the same pattern. then D can be 
cut into finitely many pieces and reassembled to form D’. 

(d) Find a termula relating the area of a fundamental domain to the order of the point 
group of the pattern. 


Let G be a discrete subgroup of Jf. Cheese a point pin the plane whese stabhilizer in G 
is trivial, and let 8 be the orbit of p. For every pet y ot S other than p. let ¢y be the line 
that is the perpendicular bisector of the line segment |p. g]. and let H., be the half plane 
that is bounded by ¢, and that contains p. Prove that 2) = 1) Hy is a fundamental demain 
for G (see Exercise M.5). 

Let G be a finite group operating on a finite set 8. For each element g of G. let SS denote 
the subset of clements of S fined by eg: S¥ =jye No gs = stand let G, be the stabilizer 
of s. 


(a) We may imagine a true - false table for the assertion that gy = s.say with rows indexed 
by elements of Gr and columns indexed by elements of 8. Construct such a table for 
the action of the dihedral group Ds on the vertices of a triangle. 

(b) Prove the formula )> <5 /Gs| = dueee {S28}. 


(c) Prove Burnside 's Forma: G -( number of orbits) = S¥}. 


eet: 
~ Ss . ‘ & ‘ : 5 : 
There are “0 = (9) ways to color the edges of an vetagon, with four black and four white. 
The group Ds eperates on this set of “0. and the erbits represent equivalent colorings. Use 
Burnside’s Formula (see Exeretse M.7) to count the number of equivalence classes. 
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More Group Theory 


The more to do or to prove, the easier the doing or the proof. 
—James Joseph Sylvester 
We discuss three topics in this chapter: conjugation, the most important group operation; 


the Sylow Theorems, which describe subgroups of prime power order in a finite group; and 
generators and relations for a group. 


7.1. CAYLEY’S THEOREM 


Every group G operates on itself in several ways, /eft multiplication being one of them: 


GXG> G 


(7.1.1) igi en 


This is a transitive operation — there is just one orbit. The stabilizer of any element is the 
trivial subgroup <1 >, so the operation is faithful, and the permutation representation 


G — Perm(G) 


(7.12) g~> mg — left multiplication by g 


defined by this operation is injective (see Section 6.11). 


Theorem 7.1.3 Cayley’s Theorem. Every finite group is isomorphic to a subgroup of a 
permutation group. A group of order n is isomorphic to a subgroup of the symmetric 
group Sp. 


Proof. Since the operation by left multiplication is faithful, G is isomorphic to its image in 
Perm(G). If G has order n, Perm(G) is isomorphic to Sp. O 


Cayley’s Theorem is interesting, but it is difficult to use because the order of Sp is 
usually too large in comparison with n. 


7.2. THE CLASS EQUATION 


Conjugation, the operation of G on itself defined by 


(7.2.1) (g,x) ~~ gxg 
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is more subtle and more important than left multiplication. Obviously, we shouldn’t use 
multiplicative notation for this operation. We’ll verify the casi law (6.7.1) for the 
operation, using g*x as a temporary notation for the conjugate gexg! 


(ie x= (gh)x(gh)! = ghxh |g} =g(h «x)gl=gx(h*x). 
Having checked the axiom, we return to the usual notation exe 


¢ The stabilizer of an element x of G for the operation of conjugation is called the centralizer 
of x. It is often denoted by Z(x): 


(7.2.2) Z(x) ={geG| gxgt=x} = {geG| gx =xg}. 
The centralizer of x is the set of elements that commute with x. 


° The orbit of x for conjugation is called the conjugacy class of x, and is often denoted by 
C(x). It consists of all of the conjugates gxg! 


(7.23) C(x) = {x ¢ G| x’ = gxg' for some g in G}. 
The counting formula (6.9.2) tells us that 
(7.2.4) IG] = |Z(x)|-|C@)| 
|G| = |centralizer|-|conj. class|. 
The center Z of a group G was defined in Chapter 2. It is the set of elements that 
commute with every element of the group: Z = {z € G | zy = yz forall yin G}. 
Proposition 7.2.5 


(a) The centralizer Z(x) of an element x of G contains x, and it contains the center Z. 


(b) An element x of G is in the center if and only if its centralizer Z(x) is the whole group 
G, and this happens if and only if the conjugacy class C(x) consists of the element x 
alone. O 


Since the conjugacy classes are orbits for a group operation, they partition the group. 
This fact gives us the class equation of a finite group: 


(7.2.6) IG| = ye ICI. 

conjugacy 

classes C 
If we number the conjugacy classes, writing them as C1, ..., Cx, the class equation reads ” 
G2) |G] = |Cy| +--- +) Cz. 


The conjugacy class of the identity element 1 consists of that element alone. It seems natural 
to list that class first, so that |C;| = 1. The other occurences of 1 on the right side of the class 
equation correspond to the elements of the center Z of G. Note also that each term on the 
right side divides the left side, because it is the order of an orbit. 


(7.2.8) The numbers on the right side of the class equation divide the 
. order of the group, and at least one of them is equal to 1. 
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This is a strong restriction on the combinations of integers that may occur in such an 
equation. 


The symmetric group $3 has order 6. With our usual notation, the element x has order 
3. Its centralizer Z(x) contains x, so its order is 3 or 6. Since yx = x2 y, X 1s not in the center 
of the group, and |Z(x)| = 3. It follows that Z(x) = <x), and the counting formula (7.2.4) 
shows that the conjugacy class C(x) has order 2. Similar reasoning shows that the conjugacy 
class C(y) of the element y has order 3. The class equation of the symmetric group S3 is 


(7.2.9) 6=1+2+3. 


As we see, the counting formula helps to determine the class equation. One can 
determine the order of a conjugacy class directly, or one can compute the order of its 
centralizer. The centralizer, being a subgroup, has more structure, and computing its order is 
often the better way. We will see a case in which it is easier to dciermine the conjugacy classes 
in the next section, but let’s look at another case in which one should use the centralizer. 

Let G be the special linear group SL7(F3) of matrices of determinant 1 with entries in 
the field 3. The order of this group is 24 (see Chapter 3, Exercise 5.4). To start computing 
the class equation by listing the elements of G would be incredibly boring. It is better to 
begin by computing the centralizers of a few matrices A. This is done by solving the equation 
PA = AP, for the matrix P. It is easier to use this equation, rather than PAP”! = A. For 


instance, let 
-1 ae 5) 
A= | 1 | anda? = i? : 


The equation PA = AP imposes the conditions b = -c and a = d, and then the equation 
det P = 1 becomes a? + c* = 1. This equation has four solutions in F3:a = +1,c = 0 and 
a=0,c = +1.So|Z(A)| = 4 and |C(A)| = 6. This gives us a start for the class equation: 
24=1+6+---.To finish the computation, one needs to compute centralizers of a few more 
matrices. Since conjugate elements have the same characteristic polynomial, one can begin 
by choosing elements with different characteristic polynomials. 

The class equation of SL2(F3) is 


(7.2.10) 24=1414+44+44+44+4+6. 


7.3. p-GROUPS 


The class equation has several applications to groups whose orders are positive powers of a 
prime p. They are called p-groups. 


Proposition 7.3.1 The center of a p-group is not the trivial group. 


Proof, Say that |G| = p® with e > 1. Every term on the right side of the class equation 
divides p®, so it is a power of p too, possibly p>? =. The positive powers of pare divisible 
by p. If the class C; of the identity made the only contribution of 1 to the right side, the 


equation would read 
py =I > Cnultiples of Pp). 


This is impossible, so there must be more 1’s on the right. The center is not trivial. oO 


198 Chapter7 More Group Theory 


A similar argument can be used to prove the following theorem for operations of 
p-groups. We'll leave its proof as an exercise. 


Theorem 7.3.2 Fixed Point Theorem. Let G be a p-group. and let S be a finite set on which 
G operates. If the order of S is not divisible by p. there is a fixed point for the operation of 
G on S —an element s whose stabilizer is the whole group. 0 


Proposition 7.3.3 Every group of order p? is abelian. 


Proof. Let G be a group of order p-. According to the previous proposition, its center Z is 
not the trivial group. So the order of Z must be p or p-. If the order of Z is p>. thenZ =.G., 
and G is abelian as the proposition asserts. Suppose that the order of Z is p, and let x be an 
element of G that is not in Z. The centralizer Z(v) contains x as well as Z, so it is strictly 
larger than Z. Since |Z(.¥) divides |G), it must be equal to ae and therefore Z(x) = G 
This means that x commutes with every element of G, so it is in the center after all, which is 
a contradiction. Therefore the center cannot have order p. C] 


Corollary 7.3.4 A group of order p> is either cyclic. or the product of two cyclic groups of 
order p. 


Proof. Let G be a group of order p*. If G contains an element of order p’. it is cyclic. If 
not, every element of G different from 1 has order p. We choose elements x and ¥ of order 
p such that y is not in the subgroup \.x». Proposition 2.11.4 shows that G is isomorphic to 
the product <x>X<y>. im 


The number of isomorphism classes of groups of order p* increases rapidly with e. 
There are five isomorphism classes of groups of order eight. 14isomorphism classes of groups 
of order 16, and 51 isomorphism classes of groups of order 32. 


7.4 THE CLASS EQUATION OF THE ICOSAHEDRAL GROUP 


In this section We use the conjugacy classes in the icosahedral group /, the group of rotational 
symmetries of a dodecahedron, to study this interesting group. You may want to refer to a 
model of a dodecahedron or to an illustration while thinking about this. 

Let @ = 2727/3. The icosahedral group contains the rotation by 6 about a vertex v. This 
rotation has spin (v. @). so we denote it by 0(,,¢). The 20 vertices form an /-orbit orbit, and 
if v is another vertex. then O,y,9) and Oy @) are conjugate elements of J. This follows from 
Corollary 5.1.28(b). The vertices form an orbit of order 20, so all of the rotations 9) are 
conjugate. They are distinct. because the only spin that defines the same rotation as (v. @) is 
(-v.-@) and -0=8@. So these rotations form a conjugacy class of order 20. 

Next. / contains rotations with angle 2:7/5 about the center of a face. and the 12 faces 
form an orbit. Reasoning as above. we find a conjugacy class of order 12. Similarly, the 
rotations with angle 47r/5 form a conjugacy class of order 12. 

Finally. / contains a rotation with angle 2 about the center of an edge. There are 30 
edges. which gives us 30 spins (e. 2r). But 7 = -2. If e is the center of an edge. so is -e, and 
the spins (e, 77) and (-e. -77) represent the same rotation. This conjugacy class contains only 
15 distinct rotations. 
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The class equation of the icosahedral group is 
41) 60 = 1+ 20+ 12412415. 


Note: Calling (v, 9) and (e, zr) spins isn’t accurate, because v and e can’t both have unit 
length. But this is obviously not an important point. 


Simple Groups 


A group G is simple if it is not the trivial group and if it contains no proper normal 
subgroup — no normal subgroup other than <1> and G. (This use of the word simple does 
not mean “uncomplicated.” Its meaning herc is roughly ‘not compound.”) Cyclic groups of 
prime order contain no proper subgroup at all; they are therefore simple groups. All other 
groups except the trivial group contain proper subgroups, thougli not necessarily proper 
normal subgroups. 

The proof of the following lemma is straightforward. 


Lemma 7.4.2, Let N be a normal subgroup of a group G. 


(a) If N contains an element x, then it contains the conjugacy class C(x) of x. 
(b) XN is a union of conjugacy classes. 
(c) The order of N is the sum of the orders of the conjugacy classes that it contains. O 


We now use the class equation to prove the following theorem. 
Theorem 7.4.3. The icosahedral group / is a simple group. 


Proof. The order of a proper normal subgroup of the icosahedral group is a proper divisor 
of 60, and according to the lemma, it is also the sum of some of the terms on the right side of 
the class equation (7.4.1), including the term 1, which is the order of the conjugacy class of 
the identity element. There is no integer that satisfies both of those requirements, and this 
proves the theorem. . es 


The property of being simple can be useful because one may run across normal 
subgroups, as the next theorem illustrates. 


Theorem 7.4.4. The icosahedral group is isomorphic to the alternating group As. Therefore 
As is a simple group. 


Proof. To describe this isomorphism, we need to find a set S of five elements on which / 
operates. This is rather subtle, but the five cubes that can be inscribed into a sedeentigdsom, 
one of which is shown below, form such a set. 

The icosahedral group operates on this set of five cubed and this operation defines 
a homomorphism gy: J -> Ss, the associated permutation representation. We show that 
defines an isomorphism from / to the alternating group As. To do this, we use the fact that 
/ is a simple group, but the only information that we need about the operation is that it isn’t 


trivial. 
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(7.4.5) One of the Cubes Inscribed in a Dodecahedron. 


The kernel of gy is a normal subgroup of J. Since J is a simple group, the kernel is 
either the trivial group <1> or the whole group J/. If the kernel were the whole group, 
the operation of 7 on the set of five cubes would be the trivial operation, which it 1s not. 
Therefore ker y = <1). This shows that ¢ is injective. It defines an isomorphism from / to 
its image in Ss. 

Next, we compose the homomorphism ¢ with the sign homomorphism o: S5 > {+ 1}, 
obtaining a homomorphism og: / > {+1}. If this homomorphism were surjective, its kernel 
would be a proper normal subgroup of /. This is not the case because / is simple. Therefore 
the restriction is the trivial homomorphism, which means that the image of ¢ is contained in 
the kernel of o, the alternating group As. Both J and As have order 60, and ¢ is injective. 
So the image of ¢, which is isomorphic to J, is As. O 


7.5 CONJUGATION IN THE SYMMETRIC GROUP 


The least confusing way to describe conjugation in the symmetric group is to think of 
relabeling the indices. If the given indices are 1,2,3,4,5, and if we relabel them as 
a, b,c, d, e, respectively, the permutation p = (134) (25) is changed to (acd) (be). 

To write a formula for this procedure, we let g : J — L denote the relabeling map 
that goes from the set / of indices to the set L of letters: p(1) = a, y(2) = b, etc. Then the 
relabeled permutation is yo po !. This is explained as follows: 


First map letters to indices using g'. 

Next, permute the indices by p. 

Finally, map indices back to letters using 9. 

We can use a permutation q of the indices to relabel in the same way. The result, the 


conjugate p’ = gpq_', will be a new permutation of the same set of indices. For example, if 
we use g = (1452) to relabel, we will get 


qpq'' = (1452) (134) (25) 0 (2541) = (435) (12) = pi’. 


There are two things to notice. First, the relabeling will produce a permutation whose 
cycles have the same lengths as the original one. Second, by choosing the permutation g 
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suitably, we can obtain any other permutation that has cycles of those same lengths. If we 
write one permutation above the other, ordered so that the cycles correspond, we can use 
the result as a table to define g. For example, to obtain Pp’ = (435) (12) as a conjugate of 
the permutation p = (134) (25), as we did above, we could write 


(134) (25) 
(435) (12) 


The relabeling permutation q is obtained by reading this table down: 1 ~» 4, etc. 

Because a cycle can start from any of its indices, there will most often be several 
permutations g that yield the same conjugate. 

The next proposition sums up the discussion. 


Proposition 7.5.1 Two permutations p and p’ are conjugate elements of the symmetric 
group if and only if their cycle decompositions have the same orders. C 


We use Proposition 7.5.1 to determine the class equation of the symmetric group S4. 
The cycle decomposition of a permutation gives us a partition of the set {1, 2, 3, 4}. The 
orders of the subsets making a partition of four can be 


dled 1; 21,1; 2eee 3,1 sso, 4. 


The permutations with cycles of these orders are the identity, the transpositions, the products 
of (disjoint) transpositions, the 3-cycles, and the 4-cycles, respectively. 

There are six transpositions, three products of transpositions, eight 3-cycles, and six 
4-cycles. The proposition tells us that each of these sets forms one conjugacy class, so the 
class equation of Sq is 


(7.5.2) CP > + O46 4.6. 
A similar computation shows that the class equation of the symmetric group Ss 1s 
(G53) 120 =1+ 10+ 15 + 20+ 20+ 30+ 24. 


We saw in the previous section (7.4.4) that the alternating group As is a simple group 
because it is isomorphic to the icosahedral group /, which is simple. We now prove that most 
alternating groups are simple. 


Theorem 7.5.4 For every n > 5, the alternating group A, is a simple group. 


To complete the picture we note that A? is the trivial group, A3 is cyclic of order three, and 
that A, is not simple. The group of order four that consists of the identity and the three 
products of transpositions (12)(34), (13)(24), (14) (23) is a normal subgroup of S4 and 
of Ax (see (2.5.13)(b)). 


Lemma 7.5.5 


(a) For n > 3, the alternating group A» is generated by 3-cycles. 
(b) For n > 5, the 3-cycles form a single conjugacy class in the alternating group An. 
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Proof. (a) This is analogous to the method of row reduction. Say that an even permutation 
p. not the identity, fixes m of the indices. We show that if we multiply p on the left by a 
suitable 3-cycle g. the product gp will fix at least m + 1 indices. Induction on m will complete 
the proof. 

If p is not the identity, it will contain either a k-cycle with k > 3, or a product 
of two 2-cycles. It does not matter how we number the indices, so we may suppose that 
p = (123.-k)--: or p = (12)(34)---. Let gq = (321). The product qp fixes the index 1 as 
well as all indices fixed by p. 


(b) Suppose that n > 5, and let g = (123). According to Proposition 7.5.1, the 3-cycles 
are conjugate in ae apmamaniatie group S,. So if qg’ is another 3-cycle, there is a permutation 
p such that pqgp™! = q’. If p is an even permutation, then qg and q’ are conjugate un An. 
Suppose that p is odd. The owe t = (45) is in S, because n > 5, and tqt! = q. 
Then pt is even, and (pt)q( po * 

We now proceed to the proof - a Theorem. Let N be a nontrivial normal subgroup 
of the alternating group A, with n > 5. We must show that JN is the whole group Ap. It 
suffices to show that N contains a 3-cycle. If so, then (7.5.5)(b) will show that N contains 
every three-cycle, and (7.5.5)(a) will show that N = Ap. 

We are given that N is anormal subgroup and that it contains a permutation x different 
from the identity. Three operations are allowed: i ~~ multiply, invert, and conjugate. 
For example, if g is ony clement of An, then gxg™ ' and x! are in N too. So is their product, 
the commutator gxg™!x7!. And since g can be arbitrary, these commutators give us many 
elements that must be in N. 

Our first step is to note that a suitable power of x will have prime order, say order 
£. We may replace x by this power, so we may assume that x has order £. Then the cycle 
decomposition of x will consist of £-cycles and 1-cycles. 

Unfortunately, the rest of the proof requires looking separately at several cases. In each 
of the cases, we compute a commutator gxg !x!, hoping to be led to a 3-cycle. Appropriate 
elements can be found by experiment. 


Case I: x has order £ > 5S. 


How the indices are numbered is irrelevant, SO we may suppose that x contains the ¢-cycle 
(12345 --£). say x = (12345 .-€)y, where y is a permutation of the remaining indices. Let 
g = (432). Then 


first do this 
exg 'x! = [(432)] [12345 --£)y]o[(234)] o[y '(@- -54321)] = (245). 


The commutator is a 3-cycle. 

Case 2: x has order 3. 

There is nothing to prove if x is a 3-cycle. If not, then x contains at least two 3- -cycles, say 
x = (123)(456)y. Let g = (432). Then eyes ol = (15243). The commutator has order 
5. We go back to Case 1. 


Case 3a: x has order 2 and it contains a 1-cycle. 
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Since it is an even permutation, x must contain at least two 2-cycles, say x = (12)(34)(5)y. 


Let g = (531). Then gxg”!x"! = (15243). The commutator has order 5, and we go back 
to Case 1 again. 


Case 3b: x has order £ = 2, and contains no 1-cycles. 


Since n > 5, x contains more than two 2-cycles. Say x = (12)(34)(5 6) y. Bet o ="G32). 
— 
x-* = (153)(246). The commutator has order 3 and we go back to Case 2. 


Then gxg 
These are the possibilities for an even permutation of prime order, so the proof of the 
theorem is complete. O 


7.6 NORMALIZERS 


We consider the orbit of a subgroup H of a group G for the operation of conjugation by G. 
The orbit of [H] is the set of conjugate subgroups {gHg™'], with g in G. The stabilizer of 
[1] for this operation is called the normalizer of H, and is denoted by N( A): 


(7.6.1) N(A) = {geG| gHg' = A}. 

The Counting Formula reads 

(7.6.2) |G| = |NCA)| - (number of conjugate subgroups). 
The number of conjugate subgroups is equal to the index [G: N()]. 


Proposition 7.6.3 Let H be a subgroup of a group G, and let N be the normalizer of H. 


(a) H is anormal subgroup of N. 
’ (b) A is anormal subgroup of G if and only if N = G. 
(c) |H| divides |N| and |N| divides |G}. O 


For example, let H be the cyclic subgroup of order two of the symmetric group Ss that 
is generated by the element p = (12)(34). The conjugacy class C(p) contains the 15 pairs 
of disjoint transpositions, each of which generates a conjugate subgroup of H. The counting 
formula shows that the normalizer N(H) has order eight: 120 = 8-15. 


7.7. THE SYLOW THEOREMS 
The Sylow Theorems describe the subgroups of prime power order of an arbitrary finite 
group. They are named after the Norwegian mathematician Ludwig Sylow, who discovered 


them in the 19th century. _ 
Let G be a group of order n, and let p be a prime integer that divides n. Let p® denote 


the largest power of p that divides n, so that 
Ge) =p lle 


where m is an integer not divisible by p. Subgroups H of G of order p® are called Sylow 
p-subgroups of G. A Sylow p-subgroup is a p-group whose index in the group isn’t divisible 


by p. 
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Theorem 7.7.2 First Sylow Theorem. A finite group whose order is divisible by a prime p 
contains a Sylow p-subgroup. 


Proofs of the Sylow Theorems are at the end of the section. 


Corollary 7.7.3 A finite group whose order is divisible by a prime p contains an element of 
order p. 


Proof. Let G be such a group, and let H be a Sylow p-subgroup of G. Then H contains an 
element x different from 1. The order of x divides the order of H, so it is a positive power 


of p, say p*. Then xP" has order p. . O 


This corollary isn’t obvious. We already know that the order of any element divides the 
order of the group, but we might imagine a group of order 6, for example, made up of the 
identity 1 and five elements of order 2. No such group exists. A group of order 6 must contain 
an element of order 3 and an element of order 2. 


The remaining Sylow Theorems give additional information about the Sylow sub- 
groups. 


Theorem 7.7.4 Second Sylow Theorem. Let G be a finite group whose order is divisible by 
a prime p. : 

(a) The Sylow p-subgroups of G are conjugate subgroups. 

(b) Every subgroup of G that is a p-group is contained in a Sylow p-subgroup. 


A conjugate subgroup of a Sylow p-subgroup will be a Sylow p-subgroup too. 


Corollary 7.7.5 A group G has just one Sylow p-subgroup H if and only if that subgroup is 
normal. O 


Theorem 7.7.6 Third Sylow Theorem. Let G be a finite group whose order n is divisible 
by a prime p. Say that n = p©m, where p does not divide m, and let s denote the number 
of Sylow p-subgroups. Then s divides m and s is congruent to 1 modulo p: s = kp + 1 for 
some integer k > 0. 


Before proving the Sylow theorems, we will use them to classify groups of orders 6, 15, 
and 21. These examples show the power of the theorems, but the classification of groups of 
order n is not easy when n has many factors. There are just too many possibilities. 


Proposition 7.7.7 


(a) Every group of order 15 is cyclic. 

(b) There are two isomorphism classes of groups of order 6, the class of the cyclic group C6 
and the class of the symmetric group S3. 

(c) There are two isomorphism classes of groups of order 21: the class of the cyclic group 


C21, and the class of a group G generated by two elements x and y that satisfy the 
relations x’ = 1, y =1, yx = xy. 
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Proof. (a) Let G be a group of order 15. According to the Third Sylow Theorem, the number 
of its Sylow 3-subgroups divides 5 and is congruent 1 modulo 3. The only such integer is 1. 
Therefore there is one Sylow 3-subgroup, say H, and it is a normal subgroup. For similar 
reasons, there is just one Sylow 5-subgroup, say K, and it is normal. The subgroup A is cyclic 
of order 3, and K is cyclic of order 5. The intersection HM K is the trivial group. Proposition 
2.11.4(d) tells us that G is isomorphic to the product group H x K. So all groups of order 15 
are isomorphic to the product C3 X Cs of cyclic groups and to each other. The cyclic group 
Cjs is one such group, so all groups of order 15 are cyclic. 


(b) Let G be a group of order 6. The First Sylow Theorem tells us that G contains a Sylow 
3-subgroup H, a cyclic group of order 3, and a Sylow 2-subgroup K, cyclic of order 2. 
The Third Sylow Theorem tells us that the number of Sylow 3-subgroups divides 2 and is 
congruent 1 modulo 3. The only such integer is 1. So there is one Sylow 3-subgroup H, 
and it is a normal subgroup. The same theorem also tells us that the number of Sylow 
two-subgroups divides 3 and is congruent 1 modulo 2. That number is either 1 or 3. 


Case 1: Both H and K are normal subgroups. 


As in the previous example, G is isomorphic to the product group H X K, which is 
abelian. All abelian groups of order 6 are cyclic. 


Case 2: G contains 3 Sylow 2-subgroups, say K,, K2, K3. 


The group G operates by conjugation on the set S = {[Ki], [K2], [K3]} of order 
three, and this gives us a homomorphism g:G — S3 from G to the symmetric group, the 
associated permutation representation (6.11.2). The Second Sylow Theorem tells us that 
the operation on S is transitive, so the stabilizer in G of the element [K;], which is the 
normalizer N(K;), has order 2. It is equal to K;. Since K; % K2 = {1}, the identity is the 
only element of G that fixes all elements of S. The operation is faithful, and the permutation 
representation ¢ is injective. Since G and S3 have the same order, ¢ is an isomorphism. 


(c) Let G be a group of order 21. The Third Sylow Theorem shows that the Sylow 7-subgroup 
K must be normal, and that the number of Sylow 3-subgroups is 1 or 7. Let x be a generator 
for K, and let y be a generator for one of the Sylow 3-subgroups H. Then x’ =1and y =1, 
so HO K = {1}, and therefore the product map H x K — G is injective (2.11.4)(a). Since 
G has order 21, the product map is bijective. The elements of G are the products x’ y/ with 
0<i<7and0 <j <3. 

Since K is a normal subgroup, yxy! is an element of K, a power of x, say x’, with i in 
the range 1 < i < 7. So the elements x and y satisfy the relations 


(7.7.8) x’ =1, pol, yx=x'y. 


These relations are enough to determine the multiplication table for the group: Hemeyer, 
the relation ne = | restricts the possible exponents 7, because It implies that yxy? = x: 

= » a 22 =4\ “3 
x= yxy 3 = yrxly? = yx! y oe 


Therefore i? =1 modulo 7. This tells us that i must be 1, 2, or 4. 
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The exponent i = 3, for instance, would imply x = x? — x6 = x1, Then x? = 1 and 
also x’ = 1, from which it follows that x = 1. The group defined by the relations (7.7.8) with 
i =3 is acyclic group of order 3, generated by y. 


Case 1: yxy"! = x. Then x commutes with y. Both H and K are normal subgroups. As 
before, G is isomorphic to a direct product of cyclic groups of orders 3 and 7, and is a cyclic 


group. 


Case 2: yxy"! = x*. As noted above, the multiplication table is determined. But we still 
have to show that this group actually exists. This comes down to showing that the relations 
don’t cause the group to collapse, as happens when i = 3. We’ll learn a systematic method 
for doing this, the Todd-Coxeter Algorithm, in Section 7.11. Another way is to exhibit the 
group explicitly, for example as a group of matrices. Some experimentation is required to do 
this. 

Since the group we are looking for is supposed to contain an element of order 7, it is 
natural to try to find suitable matrices with entries modulo 7. At least we can write down a 
2X 2 matrix with entries in F7 that has order 7, namely the matrix x below. Then y can be 
found by trial and error. The matrices 


of ml 


with entries in F7 satisfy the relations x’ = 1, y> =1, yx = x*y, and they generate a group 
of order 21. 


Case 3: yxy | = x*. Then y*xy"* = x*. We note that y? is also an element of order 3. So we 
may replace y by y’, which is another generator for H. The result is that the exponent 4 is 
replaced by 2, which puts us back in the previous case. 


Thus there are two isomorphism classes of groups of order 21, as claimed. a 


We use two lemmas in the proof of the first Sylow Theorem. 


Lemma 7.7.9 Let U be a subset of a group G. The order of the stabilizer Stab([U]) of [U] 
for the operation of left multiplication by G on the set of its subsets divides both of the 
orders |U} and |G|. 


Proof. If H is a subgroup of G, the H-orbit of an element u of G for left multiplication by 
H is the right coset Hu. Let H be the stabilizer of [U]. Then multiplication by H permutes 
the elements of U, so U is partitioned into H-orbits, which are right cosets. Each coset has 
order ||. so |H| divides |U|. Because H is a subgroup, || divides |G}. O 


Lemma 7.7.10 Let 1 be an integer of the form p®m, where e > 0 and p does not divide m. 
The number N of subsets of order p® in a set of order n is not divisible by p. 


Proof. The number N is the binomial coefficient 


(7) Poe (rt — 1) Seat A NO ete 


Pep! = 1 ee 
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The reason that N# 0 modulo p is that every time p divides a term (nm — k) in the numerator 
of N, it also divides the term ( D —k) of the denominator the same number of times: 
If we write k in the form k = p’£, where p does not divide £, then i < e. Therefore 
(m — k) = (p® —k) and (n — k) = (p*m — k) are both divisible by p! but not by p't?. O 


Proof of the First Sylow Theorem. Let S be the set of all subsets of G of order p®. One of 
the subsets is a Sylow subgroup, but instead of finding it directly we look at the operation of 
left multiplication by G on S. We will show that one of the subsets [U] of order p® has a 
stabilizer of order p*. That stabilizer will be the subgroup we are looking for. 


We decompose S into orbits for the operation of left multiplication, obtaining an 
equation of the form 


N=|S|= > |O1. 


orbits O 


According to Lemma 7.7.10, p doesn’t divide N. So at least one orbit has an order that isn’t 
divisible by p, say the orbit Ojyj of the subset [U]. Let H be the stabilizer of [UV]. Lemma 
7.7.9 tells us that the order of H divides the order of U, which is p®. So |H| is a power of p. 
We have |H| - |Ojuj| = |G| = p’m, and |Ojuj| 1sn’t divisible by p. Therefore |O[ujl =m 
and |H| = p°. So His a Sylow p-subgroup. O 


Proof of the Second Sylow Theorem. Suppose that we are given a p-subgroup K and a 
Sylow p-subgroup H. We will show that some conjugate subgroup A’ of H contains K, 
which will prove (b). If K is also a Sylow p-subgroup, it will be equal to the conjugate 
subgroup H’, so (a) will be proved as well. 

We choose a set C on which the group G operates, with these properties: p does not 
divide the order |C|, the operation is transitive, and C contains an element c whose stabilizer 
is H. The set of left cosets of H in G has these properties, so such a set exists. (We prefer 
not to clutter up the notation by explicit reference to cosets.) 

We restrict the operation of G on C to the p-group K. Since p doesn’t divide |C|, there 
is a fixed point c’ for the operation of K. This is the Fixed Point Theorem 7.3.2. Since the 
operation of G is transitive, c’ = gc for some g in G. The stabilizer of c’ is the conjugate 
subgroup gHg"! of H (6.7.7), and since K fixes c’, the stabilizer contains K. Oo 


Proof of the Third Sylow Theorem. We write |G| = p°m as before. Let s denote the number 
of Sylow p-subgroups. The Second Sylow Theorem tells us that the operation of G on the 
set S of Sylow p-subgroups is transitive. The stabilizer of a particular Sylow p-subgroup [H] 
is the normalizer N = N(A) of H. The counting formula tells us that the order of S, which 
is s, is equal to the index [G: N]. Since N contains H (7.6.3) and since [G: H] is equal to m, 
s divides m. 

Next, we decompose the set S into orbits for the operation of conjugation by H. The 
H-orbit of [H] has order 1. Since H is a p-group, the order of any H-orbit is a power of p. 
To show that s=1 modulo p, we show that no element of S except [H] is fixed by H. 

Suppose that H’ is a p-Sylow subgroup and that conjugation by H fixes [H’]. Then H 
is contained in the normalizer N’ of H’,so both H and H’ are Sylow p-subgroups of N’. The 
second Sylow theorem tells us that the p-Sylow subgroups of N’ are conjugate subgroups of 
N’. But H’ is a normal subgroup of N’ (7.6.3)(a). Therefore H’ = H. oO 
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7.8 GROUPS OF ORDER 12 


We use the Sylow Theorems to classify groups of order 12. This theorem serves to illustrate 
the fact that classifying groups becomes complicated when the order has several factors. 


Theorem 7.8.1 There are five isomorphism classes of groups of order 12. They are 
represented by: 


e the product of cyclic groups C4 x C3, 

¢ the product of cyclic groups C2 X C2 x C3, 

the alternating group A4, 

e the dihedral group Dg, 

e the group generated by elements x and y, with relations x* = 1, > = 1, xy = eX. 


All but the last of these groups should be familiar. The product group C4 X C3 is isomorphic 
to C2, and C2 X C2 X C3 is isomorphic to C2 X C¢ (see Proposition 2.11.3). 


Proof. Let G be a group of order 12, let H be a Sylow 2-subgroup of G, which has order 4, 
and let K be a Sylow 3-subgroup of order 3. It follows from the Third Sylow Theorem that 
the number of Sylow 2-subgroups is either 1 or 3, and that the number of Sylow 3-subgroups 
is 1 or 4. Also, H is a group of order 4 and is therefore either a cyclic group C4 or the Klein 
four group C2 X C2 (Proposition 2.11.5). Of course K is cyclic. 

Though this is not necessary for the proof, begin by showing that at least one of the 
two subgroups, H or K, is normal. If K is not normal, there will be four Sylow 3-subgroups 
conjugate to K, say K,,..., K4, with K; = K. These groups have prime order, so the 
intersection of any two of them is the trivial group <1>. Then there are only three elements 
of G that are not in any of the groups K;. This fact is shown schematically below. 


A Sylow 2-subgroup H has order 4, and HM K; =<1)>. Therefore H consists of the three 
elements not in any of the groups K;, together with 1. This describes H for us and shows 
that there is only one Sylow 2-subgroup. Thus AH is normal. 

Next, we note that HM K = <1), so the product map H x K => G is a bijective map 
of sets (2.11.4). Every element of G has a unique expression as a product hk, with h in H 
and kin K. 
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Case 1: H and K are both normal. 


_Then G is isomorphic to the product group H x K (2.11.4). Since there are two 
possibilities for H and one for K, there are two possibilities for G: 


GrC4XC3. or GC XC2XCs3. 
These are the abelian groups of order 12. 


Case 2: K is not normal. 


There are four conjugate Sylow 3-subgroups, K,,..., K4, and G operates by con- 
jugation on this set of four. This operation determines a permutation representation, a 
homomorphism g: G-> S4 to the symmetric group. We’ll show that g maps G isomorphi- 
cally to the alternating group Aj. 

The normalizer N; of K; contains K;, and the counting formula shows that |N;| = 3. 
Therefore N; = K;. Since the only element in common to the subgroups K; is the identity, 
only the identity stabilizes all of these subgroups. Thus the operation of G is faithful, ¢ is 
injective, and G is isomorphic to its image in S4. 

Since G has four subgroups of order 3, it contains eight elements of order 3. Their 
images are the 3-cycles in S4, which generate A, (7.5.5). So the image of G contains A4. 
Since G and A, have the same order, the image is equal to Aq. 


Case 3: K is normal, but H is not. 
Then H operates by conjugation on K = {1, y, y*}. Since H is not normal, it contains 
an element x that doesn’t commute with y, and then xyx7! = 


Case 3a: K is normal, H is not normal, and A is a cyclic group. 


The element x generates H/, so G is generated by elements x and y, with the relations 
(7.8.2) xt=1,y=1,xy=y'x. 


These relations determine the multiplication table of G, so there is at most one isomorphism 
class of such groups. But we must show that these relations don’t collapse the group further, 
and as with groups of order 21 (see 7.7.8), it is simplest to represent the group by matrices. 
We’ll use complex matrices here. Let @ be the complex cube root of unity eo Vie 


complex matrices 


(7.8.3) =) 1] ; = | 


satisfy the three relations, and they generate a group of order 12. 


Case 3b: K is normal, H is not normal, and H ~ C2 X C2. 


The stabilizer of y for the operation of H by conjugation on the set {y, y} has order 2 
So H contains an element z#1 such that zy = yz and also an element x such that xy = yx. 
Since H is abelian, xz = zx. Then G is generated by three elements x, y, z, with relations 
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These relations determine the multiplication table of the group, so there is at most one 
isomorphism class of such groups. The dihedral group Dg isn’t one of the four groups 
described before, so it must be this one. Therefore G is isomorphic to De. Cl 


7.9 THE FREE GROUP 


We have seen that one can compule in the Symametale group 53 using the usual generators x 
and y, together with the relations x° = 1, y? = 1, and yx = x’y. In the rest of the chapter, 
we study generators and relations in other groups. 

We first consider groups with generators that satisfy no relations other than ones (such 
as the associative law) that are implied by the group axioms. A set of group elements that 
satisfy no relations except those implied by the axioms is called free, and a group that has a 
free set of generators is called a free group. 

To describe free groups, we start with an arbitrary set, say S = {a, b,c, ...}. We call its 
elements ‘‘symbols,” and we define a word to be a finite string of symbols, in which repetition 
is allowed. For instance a, aa, ba, and aaba are words. Two words can be composed by 
juxtaposition, that is, placing them side by side: 


aa, ba~ aaba. 


This is an associative law of composition on the set W of words. We include the ‘‘empty 
word” in W as an identity element,.and we use the symbol 1 to denote it. Then the set 
W becomes what is called the free semigroup on the set S. [t isn’t a group because it lacks 
inverses, and adding inverses complicates things a little. 

Let S’ be the set that consists of symbols a and a! for every ain S: 


(7.9.1) Sarg. b; be coca 
and let W’ be the semigroup of words made using the symbols in S’. If a word looks like 
eee tae or A. eee 


for some x in S, we may agree to cancel the two symbols x and x"! to reduce the length of 
the word. A word is called reduced if no such cancellation can be made. Starting with any 
word w in W’, we can perform a finite sequence of cancellations and must eventually get a 
reduced word wp, possibly the empty word 1. We call wo a reduced form of w. 

There ney - more than one way to proceed with cancellation. For instance, starting 
with w = abb™!c~'cb, we can proceed in two ways: 


apc eh ab" ¢\gb 
1 4 
ayo ab 
4 1 


ab ab 


The same reduced word is obtained at the end, though the symbols come from different 


places in the original word. (The ones that remain at the end have been underlined.) This is 
always true. 


Proposition 7.9.2 There is only one reduced form of a given word w. 
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Proof. We use induction on the length of w. If w is reduced, there is nothing to show. If not, 
there must be some pair of symbols that can be cancelled, say the underlined pair 


Ly = me oa A 
ane allow x to denote any element of S’, with the understanding that if x = a™! then 
x”* = a.) If we show that we can obtain every reduced form of w by cancelling the pair xx"! 
first, the proposition will follow by induction, because the word --- #71... is shorter. 

Let wo be areduced form of w. It is obtained from w by some sequence of cancellations. 
The first case is that our pair xx7! is cancelled at some step in this sequence. If so, we may 
as well cancel xx"! first. So this case is settled. On the other hand, since wg is reduced, the 
pair xx"! cannot remain in wo. At least one of the two symbols must be cancelled at some 
time. If the pair itself is not cancelled, the first cancellation involving the pair must look like 


ao or PBs 


Notice that the word obtained by this cancellation is the same as the one obtained by 
cancelling the pair xx !. So at this stage we may cancel the original pair instead. Then we 
are back in the first case, so the proposition is proved. O 


We call two words w and w’ in W’ equivalent, and we write w ~ w’, if they have the 
same reduced form. This is an equivalence relation. 


Proposition 7.9.3 Products of equivalent words are equivalent: If w~ w’ and u~ v’, then 
wu~ wd’. 


Proof. To obtain the reduced word equivalent to the product wv, we may first cancel as 
much as possible in w and in v, to reduce w to wo and v to vo. Then wv is reduced to wovo. 
Now we continue, cancelling in wovo until the word is reduced. If w~ w’ and u~ v’, the 
same process, when applied to w’v’, passes through wovo too, so it leads to the same re- 
duced word. O 


It follows from this proposition that equivalence classes of words can be multiplied: 


Proposition 7.9.4 The set F of equivalence classes of words in W’ is a group, with the law 
of composition induced from multiplication (juxtaposition) in W’. 


Proof. The facts that multiplication is associative and that the class of the empty word 1 is 
an identity follow from the corresponding facts in W’ (see Lemma 2.12.8). We must check 
that all elements of F are invertible. But clearly, if w is the product xy---z of elements of 
S’, then the class of z7!--- y-!x7! inverts the class of w. j 0 


The group F of equivalence classes of words in S’ is called the free group on the set 
S. An element of F corresponds to exactly one reduced word in W’. To multiply reduced 
words, combine and cancel: (abc"!)(cb) ~ abc !cb = abb. 

Power notation may be used: aaab™!b™! = a3b?. 


Note: The free group on a set S = {a} of one element is simply an infinite cyclic group. In 
contrast, the free group on a set of two or more elements is quite complicated. 
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7.10 GENERATORS AND RELATIONS 


Having described free groups, we now consider the more common case, that a set of 
generators of a group is not free — that there are some nontrivial relations among them. 


Definition 7.10.1 A relation R among elements x), ..., X» of a group G is a word r in the 
free group on the set {x,,..., X»} that evaluates to 1 in G. We will write such a relation 
either as r, or for emphasis, as r = 1. 


For example, the dihedral group D, of symmetries of a regular n-sided polygon is 
generated by the rotation x with angle 277/n and a reflection y, and these generators satisfy 
relations that were listed in (6.4.3): 


” (F062) =1,y=1,xyxy=1. 


(The last relation is often written as yx = x7! y, but it is best to write every relation in the 
form r = 1 here.) 

One can use these relations to write the elements of D, in the form x! y/ withO0 <i<n 
and 0 < j <2, and then one can compute the multiplication table for the group. So 
the relations determine the group. They are therefore called defining relations. When the 
relations are more complicated, it can be difficult to determine the elements of the group 
and the multiplication table explicitly, but, using the free group and the next lemma, we 
will define the concept of a group generated by a given set of elements, with a given set of 
relations. 


Lemma 7.10.3 Let R be a subset of a group G. There exists a unique smallest normal 
subgroup N of G that contains R, called the normal subgroup generated by R. If a normal 
subgroup of G contains R, it contains N. The elements of N can be described in either of 
the following ways: 


(a) An element of G is in N if it can be obtained from the elements of R using a finite 
sequence of the operations of multiplication, inversion, and conjugation. 

(b) Let R’ be the set consisting of elements r and r“! with r in R. An element of G is in N 
if it can be written as a product y, --- y, of some arbitrary length, where each y, is a 
conjugate of an element of R’. 


Proof. Let N denote the set of elements obtained by a sequence of the operations mentioned 
in (a). A nonempty subset is a normal subgroup if and only if it is closed under those 
operations. Since N is closed under those operations, it is a normal subgroup. Moreover, 
any normal subgroup that contains R must contain N. So the smallest normal subgroup 
containing R exists, and is equal to N. Similar reasoning identifies N as the subset described 
in (b). . O 
As usual, we must take care of the empty set. We say that the empty set generates the trivial 


subgroup {1}. 


Definition 7.10.4 Let F be the free group onaset S = {x1,..., Xp}, and let Sarin he) 
be a set of elements of ¥. The group generated by S, with relations r, =1, ... , ry = 1, is 
the quotient group G = F/R, where FR is the normal subgroup of F generated by R. 
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The group G will often be denoted by 


(7.10.5) KX, ee rae, ple 


Thus the dihedral group Dy, is isomorphic to the group 
(7.10.6) KX, |X", yy aeyxy>. 


Example 7.10.7 In the tetrahedral group T of rotational symmetries of a regular tetrahedron, 
let x and y denote rotations by 27/3 about the center of a face and about a vertex, and let z 
denote rotation by mz about the center of an edge, as shown below. With vertices numbered 
as in the figure, x acts on the vertices as the permutation (234), y acts as (123), and z acts 
as (13)(24). Computing the product of these permutations shows that xyz acts trivially on 
the vertices. Since the only isometry that fixes all vertices is the identity, xyz = 1. 


(7.10.8) 


So the following relations hold in the tetrahedral group: 


(7.10.9) eal, pal, 2=1, xyz=1. ‘Zz 
Two questions arise: 


1. Is this a set of defining relations for JT? In other words, is the group 
(7.10.10) Cry) Pale arr aee A 


isomorphic to T? 

It is easy to verify that the rotations x, y, z generate 7, but it isn’t particularly easy 
to work with the relations. It is confusing enough to list the 12 elements of the group 
as products of the generators without repetition. We show in the next section that the 
answer to our question is yes, but we don’t do that by writing the elements of the group 
explicitly. 


2. How can one compute in a group G = <X1,...,%n|",---, ry> that is presented by 
generators and relations? 

Because computation in the free group F is easy, the only problem is to decide when an 
element w of the free group represents the identity element of G, i.e., when w is an element 
of the subgroup ®. This is the word problem for G. If we can solve the word problem, then 
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because the relation w; = w2 is equivalent to w 'we = 1, we will be able to decide when 
two elements of the free group represent equal elements of G. This will enable us to compute. 

The word problem can be solved in any finite group, but not in every group. However, 
we won’t discuss this point, because some work is required to give a precise meaning to 
the statement that the word problem can or cannot be solved. If you are interested, see 
[Stillwell]. 

The next example shows that computation in R can become complicated, even in a 
relatively simple case. 


Example 7.10.11 The element w = yxyx is equal to 1 in the group T. Let’s verify that w 
is in the normal subgroup R generated by the four relations (7.10.9). We use what you will 
recognize as a standard method: reducing w to the identity by the allowed operations. 

The relations that we will use are z? and xyz, and we’ll denote them by p and gq, 
respectively. First, let wy = y 'wy = xyxy. Because R is a normal subgroup, w is in R if 
and only if w is. Next, let wz = q’!w, = z 'xy. Since q is in R, wz is in FR if and only if wy 
is. Continuing, w3 = zwoz) =xyz!, wa=q'w3= z'z!, pwa = 1. Solving back, 
w = yqz''qp ‘zy ‘isin R. Thus w is equal to 1 in the group (7.10.10). O 


We return to the group G defined by generators and relations. As with any quotient 
group, we have a canonical homomorphism 


1c: Fs Fe 


that sends a word w to the coset w = [we], and the kernel of z is R (2.12.2). To keep 
track of the group in which we are working, it might seem safer to denote the images in G of 
elements of ¥ by putting bars over the letters. However, this isn’t customary. When working 
in G, one simply remembers that elements w, and w7 of the free group are equal in G if the 
cosets w,F and w27F are equal, or if ww is in R. 

Since the defining relations r; are in R, rj = 1 is true in G. If we write r; out as words, 
then because zr is a homomorphism, the corresponding product in G will be equal to 1 (see 
Corollary 2.12.3). For instance, xyz = 1 is true in the group <x, y, z| x, y°, z”, xyz. 


We go back once more to the example of the tetrahedral group and to the first question. 
How is the group <x, y, z| x°, y’, z”, xyz> related to T? A partial explanation is based on 
the mapping properties of free groups and of quotient groups. Both of these properties are 
intuitive. Their proofs are simple enough that we leave them as exercises. 


Proposition 7.10.12 Mapping Property of the Free Group. Let F be the free group on a set 
S = {a, b,...}, and let G be a group. Any map of sets f: S > G extends in a unique way 
to a group homomorphism gy: F — G. If we denote the image f(x) of an element x of S 
by x, then y sends a word in S’ = {a,a"'!, bb, .. .} to the corresponding product of the 
elements {a,a7!,b,b'...}inG. . 0 


This property reflects the fact that the elements of S satisfy no relations in F except those 
implied by the group axioms. It is the reason for the adjective ‘‘free.”” 


Proposition 7.10.13 Mapping Property of Quotient Groups. Let g: G’ > G be a group 
homomorphism with kernel K, and let N be a normal subgroup of G’ that is contained in K. 
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bet G eenG? /N, and let z: GC’ > G be the canonical map a~» a. The rule G(@) = (a) 
defines a homomorphism @: iG = G,and@on=@Q. 


G 


G . a 

This mapping property generalizes the First Isomorphism Theorem. The hypothesis that NV 
be contained in the kernel K is, of course, essential. 

The next corollary uses notation introduced previously: S = {x,,..., Xn} is a subset 

of a group G, R = {r,,..., rg} is a set of relations among the elements of S of G, F 


is the free group on S, and R is the normal subgroup of F generated by R. Finally, 
Cle X), |) 2. . EOS /R. 


Corollary 7.10.14 
(i) There is a canonical homomorphism w:G —> G that sends x; ~» x;. 


(ii) wy is surjective if and only if the set S generates G. 


(ili) y is injective if and only if every relation among the elements of S is in R. 


Proof. We will prove (i), and omit the verification of (ii) and (iii). The mapping property of 
the free group gives us a homomorphism gy: F — G with g(x;) = x;. Since the relations 
rj evaluate to 1 in G, R is contained in the kernel K of g. Since the kernel is a normal 
subgroup, ? is also contained in K. Then the mapping property of quotient groups gives us 
amap @:G — G. This is the map w: 


G O 


If the map w described in the corollary is bijective, one says that R forms a complete 
set of relations among the generators S. To decide whether this is true requires knowing 
more about G. Going back to the tome ae group, the corollary gives us a homomorphism 
w:G > T, where G =<x, y,z|x°, y>, z*, xyz>. It is surjective because x, y, z generate T. 
And we saw in Example 7.10.11 that the relation yxyx, ~~ une among the elements 
of 7, is in the normal subgroup R generated by the set {x3, y?, 27, xyz}. Is every relation 
among x, y, Z in R? If not, we’d want to add some more relations to our list. It may seem 
disappointing not to have the answer to this question yet, but we will see in the next section 
that w is indeed bijective. 


Recapitulating, when we speak of a group defined by generators S and relations R, we 
mean the quotient group G = F/R, where F is the free group on S and F is the normal 
subgroup of F generated by R. Any set of relations will define a group. The larger R is, the 
larger R becomes, and the more collapsing takes place in the homomorphism 7: F > G. 
The extreme case is R = F, in which case G is the trivial group. All relations become true in 
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the trivial group. Problems arise because computation in F/R may be difficult. But because 
generators and relations allow efficient computation in many cases, they are a useful tool. 


7.11 THE TODD-COXETER ALGORITHM 


The Todd-Coxeter Algorithm, which is described in this section, is an amazing method for 
determining the operation of a finite group G on the set of cosets of a subgroup H. 
In order to compute, both G and H must be given explicitly. So we consider a group 


(ait) G =C%15 2 Xm |M1s «2+ FEZ 


presented by generators and relations, as in the previous section. 
We also assume that the subgroup H of G is given explicitly, by a set of words 


(7.11.2) (fils aes! 


in the free group F, whose images in G generate H. 

The algorithm proceeds by constructing some tables that become easier to read when 
one works with right cosets Hg. The group G operates by right multiplication on the set of 
right cosets, and this changes the order of composition of operations. A product gh acts by 
right multiplication as ‘‘first multiply by g, then by h”’. Similarly, when we want permutations 
to operate on the right, we must read a product this way: 


first do this then thi 


GaA) s (123) = (12)(34). 


The following rules suffice to determine the operation of G on the right cosets: 


Rules 7.11.3 


1. The operation of each generator is a permutation. 
2. The relations operate trivially: they fix every coset. 
3. The generators of H fix the coset [ H]. 

4. The operation is transitive. 


The first rule follows from the fact that group elements are invertible, and the second one 
reflects the fact that the relations represent the identity element of G. Rules 3 and 4 are 
special properties of the operation on cosets. 

When applying these rules, the cosets are usually denoted by indices 1, 2,3, ..., with 1 
standing for the coset [H]. At the start, one doesn’t know how many indices will be needed; 
new ones are added as necessary. 

We begin with a simple example, in which we replace y? by y? in the relations (7.10.9). 


Example 7.11.4 Let G be the group <x, y, z|x°, y’, z2, xyz>, and let H be oe cyclic 
subgroup <z> generated by z. First, Rule 3 tells us that z sends 1 to itself, 1 S 1. This 
exhausts the information in Rule 3, so Rules 1 and 2 take over. Rule 4 will only appear 
implicitly. 

Nothing we have done up to now tells us what x does to the index 1. In such a case, 


the procedure is simply to assign a new index, 1 > 2. (Since 1 stands for the coset [H], the 
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index 2 stands for [Hx], but it is best to ignore this.) Continuing, we don’t know where x 


sends the index 2, so we assign a third index, 2 + 3.Then1 ae 3. 


What we have so far is a partial operation, meaning that the operations of some 
generators on some indices have been assigned. It is helpful to keep track of the partial 
operation as one goes along. The partial operation that we have so far is 


z=(1)--- and x=(123-.. 


There is no closing parenthesis for the partial operation of x because we haven’t determined 
the index to which x sends 3. 


Rule 2 now comes into play. It tells us that, because x? is a relation, it fixes every index. 
Since x? sends 1 to 3, x must send 3 back to 1. It is customary to sum this information up in 
a table that exhibits the operation of x on the indices: 


2S ame 9 
1 a a 


The relation xxx appears on top, and Rule 2 is reflected in the fact that the same index 1 
appears at both ends. We have now determined the partial operation 


x= TESS) --- , 
except that we don’t yet know whether or not the indices 1, 2, 3 represent distinct cosets. 
Next, we ask for the operation of y on the index 1. Again, we don’t know it, so we 
assign a new index: 1 +, 4, Rule 2 applies again. Since y” is a relation, y must send 4 back to 


1. This is exhibited in the table 


Mees 
—— = (14)---. 
ms 

For review, we have now determined the entries in the table below. The four defining 
relations appear on top. 


a oo | 1 4 1 ji I Le | me h 


The missing entry in the table for xyz is 1. This follows from the fact that z acts as a 
permutation that fixes the index 1. Entering 1 into the table, we see that 2 +, 1. But we also 


have 4 1. Therefore 4 = 2. We replace 4 by 2 and continue constructing a table. 
The entries below have been determined: 


i a Vey ee Xu Ne 
a 2 oe 1 ae | Ls Le 8 ei | 
2a ts tee 2 yee 2 2 2 3 2 
2 genes (nme? 2 aes | 3 3 3 3 a 1 2a 


The third row of the table for xyz shows that 2 -, 3, and this determines the rest of the 
table. There are three indices, and the complete operation is 


~=9423), y=2), z= (23). 
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At the end of the section, we will show that this is indeed the permutation representation 
defined by the operation of G on the cosets of H. | CJ 


What such a table tells us depends on the particular case. It will always tell us the 
number of cosets, the index [G: H], which will be equal to the number of distinct indices: 
3 in our example. It may also tell us something about the order of the generators. In our 
example, we are given the relation z? = 1, so the order of z must be 1 or 2. But z acts on 
indices as the transposition (23), and this tells us that we can’t have z = 1. So the order of z 
is 2, and |H| = 2. The counting formula |G! = |H|[G: H] shows that G has order 2-3 = 6. 
The three permutations shown above generate the symmetric group $3, so the permutation 
representation G — S3 defined by this operation is an isomorphism. 

If one takes for H the trivial subgroup {1}, the cosets correspond bijectively to the 
group elements, and the permutation representation determines G completely. The cost of 
doing this is that there will be many indices. In other cases, the permutation representation 
may not suffice to determine the order of G. 

We’ll compute two more examples. 


Example 7.11.5 We show that the relations (7.10.9) form a complete set of relations for 
the tetrahedral group. The verification is simplified a little if one uses the relation xyz = 1 
to eliminate the generator z. Since z* = 1, that relation implies that xy = z_! = z. The 
remaining elements x, y suffice to generate J. So we substitute z = xy into z*, and replace 
the relation z? by xyxy. The relations become 


(7.11.6) e=1,y=1,xyxy=1. 


These relations among x and y are equivalent to the relations (7.10.9) among x, y, and z, so 
they hold in T. 

Let G denote the group <x, y|x?, y’, xyxy >. Corollary (7.10.14) gives us a homo- 
morphism y:G — T. To show that (7.11.6) are defining relations for 7, we show that y is 
bijective. Since x and y generate 7, yr is surjective. So it suffices to show that the order of G 
is equal to the order of T, which is 12. 

We choose the subgroup H = <x. This subgroup has order 1 or 3 because x? is one of 
the relations. If we show that H/ has order 3 and that the index of H in G is 4, it will follow 
that G has order 12, and we will be done. Here is the resulting table. To fill it in, work from 
both ends of the relations. 


25 25 Xx ; y y y Xx y x y 
i tt 1 2 oo err ee 
23 42 7 is eee ae: 2. 3. deni 
ae 4 Fe - (ie 3 3 4 4 2 3 
ie ms 4 4 4 4 4 2 3 404 


The permutation representation is 
Car) | = (234)yoyp— 023). 


Since there are four indices, the index of H is 4. Also, x does have order 3, not 1, because 
the permutation associated to x has order 3. The order of G is 12, as predicted. 
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Incidentally, we see that T is isomorphic to the alternating group Ag, because the 
permutations (7.11.7) generate that group. O 


Example 7.11.8 We modify the relations (7.10.9) slightly, to illustrate how “bad” relations 


may collapse the group. Let G be the group <x, y|x°, y3, yxyxy>, and let H be the 
subgroup < y>. Here is a start for a table: 


2 2 2 2 2 Jd col-eniiie2 @2 


In the table for yx yxy, the first three entries in the first row are determined by working from 
the left, and the last three by working from the right. That row shows that 2 2, 3. The second 


row is determined by working from the left, and it shows that 2 3 2. So 2 =3. Looking 
at the table for xxx, we see that then 2 = 1. There is just one index left, so one coset, and 
consequently H = G. The group G is generated by y. It is a cyclic group of order 3. a 


Warning: Care is essential when constructing such a table. Any mistake will cause the 
operation to collapse. 


In our examples, we took for H the subgroup generated by one of the generators of 
G. If H is generated by a word h, one can introduce a new generator u and the new relation 
uth = 1 (i.e., u = h). Then G (7.11.1) is isomorphic to the group 


~1 
CElem a EN Tis.. HVE, OO 


and H becomes the subgroup generated by u. If H has several generators, we do this for 
each of them. 


We now address the question of why the procedure we have described determines the 
operation on cosets. A formal proof of this fact is not possible without first defining the 
algorithm formally, and we have not done this. We will discuss the question informally. (See 
[Todd-Coxeter] for a more complete discussion.) We describe the procedure this way: At a 
given stage of the computation, we will have some set I of indices, and a partial operation on 
I, the operation of some generators on some indices, will have been determined. A partial 
operation need not be consistent with Rules 1, 2, and 3, but it should be transitive; that is, 
all indices should be in the “partial orbit” of 1. This is where Rule 4 comes in. It tells us not 
to introduce any indices that we don’t need. In the starting position, I is the set {1} of one 
element, and no operations have been assigned. 

At any stage there are two possible steps: 


(7.11.9) (i) We may equate two indices i and j if the the rules tell us that they are equal, or 
(ii) we may choose a generator x and an index i such that ix has not been determined, and 


define ix = j, where j is a new index. 


We never equate indices unless their equality is implied by the rules. . . . 
We stop the process when an operation has been determined that is consistent with 
the rules. There are two questions to ask: First, will this procedure terminate? Second, if it 
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terminates, is the operation the right one? The answer to both questions is yes. It can be 
shown that the process does terminate, provided that the group G is finite, and that preference 
is given to steps of type (i). We will not prove this. More important for applications is the 
fact that, if the process terminates, the resulting permutation representation is the right one. 


Theorem 7.11.10 Suppose that a finite number of repetitions of steps (i) and (ii) yields a 
consistent table compatible with the rules (7.11.3). Then the table defines a permutation 
representation that, by suitable numbering, is the representation on the right cosets of H inG. 


Proof. Say that the group is G = <x ,...,Xn|r1,...,7%>, and let I* denote the final set of 
indices. For each generator x;, the table determines a permutation of the indices, and the 
relations operate trivially. Corollary 7.10.14 gives us a homomorphism from G to the group 
of permutations of I*, and therefore an operation, on the right, of G on I* (see Proposition 
6.11.2). Provided that we have followed the rules, the table will show that the operation of 
G is transitive, and that the subgroup H fixes the index 1. 

Let C denote the set of right cosets of H. We prove the proposition by defining a 
bijective map y*:I* > C from I* to C that is compatible with the operations of the group on 
the two sets. We define g* inductively, by defining at each stage a map y:1 — C from the 
set of indices determined at that stage to C, compatible with the partial operation on I that 
has been determined. To start, go: {1} > C sends 1~»[H]. Suppose that g:I — C has been 
defined, and let I’ be the result of applying one of the steps (7.11.9) to I. 

In case of step (ii), there is no difficulty in extending g to a map g’:I’ > C. Say that 
y(i) is the coset [Hg], and that the operation of a generator x on i has been defined to be 
a new index, say ix = j. Then we define g’(j) = [H gx], and we define g’(k) = g(k) for all 
other indices. 

Next, suppose that we use step (i) to equate the indices i and j, so that I is collapsed to 
form the new index set I’. The next lemma allows us to define the map g’:I > C. 


Lemma 7.11.11 Suppose that a map g:I — C is given, compatible with a partial operation on 
I. Let iand j be indices in I, and suppose that one of the rules forces i = j. Then g(@) = ¢(j). 


Proof. This is true because, as we have remarked before, the operation on cosets does satisfy 
the rules. O 


The surjectivity of the map ¢ follows from the fact that the operation of the group on 
the set C of right cosets is transitive. As we now verify, the injectivity follows from the facts 
that the stabilizer of the coset [| H] is the subgroup H, and that the stabilizer of the index 1 
contains H. Let i and j be indices. Since the operation on I* is transitive, i = la for some 
group element a, and then g(i) = g(1)a = [Ha]. Similarly, if j = 1b, then y(j) = [Hb]. 
Suppose that g(i) = g(j), ie., that Ha = Hb. Then H = Hba™', so ba“ is an element of 
H. Since H stabilizes the index 1, 1 = 1ba™! and i= 1a = 1b =j. O 


The method of postulating what we want has many advantages; 
they are the same as the advantages of theft over honest toil. 


—Bertrand Russell 
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EXERCISES 


Section 1 Cayley’s Theorem 


1.1. Does the rule g «x = xg™! define an operation of G on G? 


1.2. 


Let H be a subgroup of a group G. Describe the orbits for the operation of H on G by 
left multiplication. 


Section 2 The Class Equation 


2.1. 


22 


P28 


2.4. 


PAY, 


2.6. 
2.7. 


2.8. 
2: 


2.10. 


2.11. 


125 0 
2.13. 


2.14. 


Determine the centralizer and the order of the conjugacy class of 


(a) the matrix re 7 in GL>(F3), (b) the matrix |’ ,| in GL2(Fs). 


A group of order 21 contains a conjugacy class C(x) of order 3. What is the order of x in 
the group? 

A group G of order 12 contains a conjugacy class of order 4. Prove that the center of G 
is trivial. 

Let G be a group, and let g be the nth power map: g(x) = x”. What can be said about 
how ¢ acts on conjugacy classes? 


uy 
1 


the conjugacy classes in G, and sketch them in the (x, y)-plane. 


Let G be the group of matrices of the form le , where x, y € Rand x > 0. Determine 


Determine the conjugacy classes in the group M of isometries of the plane. 

Rule out as many as you can, as class equations for a group of order 10: 
14+14+14245, 1424245, 1424344, 14142424242. 

Determine the possible class equations of nonabelian groups of order (a) 8, (b) 21. 


Determine the class equation for the following groups: (a) the quaternion group, (b) Da, 
(c) Ds, (d) the subgroup of GL2(F3) of invertible upper triangular matrices. 


(a) Let A be an element of SO3 that represents a rotation with angle z. Describe the 
centralizer of A geometrically. 

(b) Determine the centralizer of the reflection r about the e;-axis in the group M of 
isometries of the plane. 


Determine the centralizer in G L3(R) of each matrix: 


SHEL 


Determine all finite groups that contain at most three conjugacy classes. 


Let N be a normal subgroup of a group G. Suppose that |N| = 5 and that |G| is an odd 
integer. Prove that N is contained in the center of G. 


The class equation of a group G is1+4+5+5+5. 


(a) Does G have a subgroup of order 5? If so, is it a normal subgroup? 
(b) Does G have a subgroup of order 4? If so, is it a normal subgroup? 
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2.15. 
2.16. 


2.17. 


2.18. 


Verify the class equation (7.2.10) of SL2(F3). 


Let y:G — G’ bea surjective group homomorphism, let C denote the conjugacy class of 
an element x of G, and let C’ denote the conjugacy class in G’ of its image g(x). Prove 
that g maps C surjectively to C’, and that |C’| divides |C]. 


Use the class equation to show that a group of order pq, with p and q prime, contains an 
element of order p. 


Which pairs of matrices es Fy ‘ ki J are conjugate elements of (a) GL2(R), 
(b) SL2(R)? 


Section3 p-Groups 


3.1. 
3.2 


oe. 


3.4, 


Prove the Fixed Point Theorem (7.3.2). 


Let Z be the center of a group G. Prove that if G/Z is a cyclic group, then G is abelian, 
and therefore G = Z. 


A nonabelian group G has order p*, where p is prime. 


(a) What are the possible orders of the center Z? 
(b) Let x be an element of G that isn’t in Z. What is the order of its centralizer Z(x)? 
(c) What are the possible class equations for G? 


Classify groups of order 8. 


Section 4 The Class Equation of the Icosahedral Group 


4.1. 


4.2. 
4.3. 
4.4. 


4.5. 


4.6. 


48. 


The icosahedral group operates on the set of five inscribed cubes in the dodecahedron. 
Determine the stabilizer of one of the cubes. 


Is As the only proper normal subgroup of Ss5? 

What is the centralizer of an element of order 2 of the icosahedral group J? 

(a) Determine the class equation of the tetrahedral group T. 

(b) Prove that T has a normal subgroup of order 4, and no subgroup of order 6. 

(a) Determine the class equation of the octahedral group O. 

(b) This group contains two proper normal subgroups. Find them, show that they are 
normal, and show that there are no others. 

(a) Prove that the tetrahedral group 7 is isomorphic to the alternating group A4, and 
that the octahedral group O is isomorphic to the symmetric group 54. 
Hint: Find sets of four elements on which the groups operate. 


(b) Two tetrahedra can be inscribed into a cube C, each one using half the vertices. 
Relate this to the inclusion A4 C Sq. 


- Let G be a group of order n that operates nontrivially on a set of order r. Prove that if 


n > r!, then G has a proper normal subgroup. 
(a) Suppose that the centralizer Z(x) of a group element x has order 4. What can be 
said about the center of the group? 


(b) Suppose that the conjugacy class C(y) of an element » has order 4. What can be said 
about the center of the group? 


4.9, 
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Let x be an element of a group G, not the identity, whose centralizer Z (x) has order pq, 
where p and q are primes. Prove that Z(x) is abelian. 


Section5 Conjugation in the Symmetric Group 


5.1. 


5.2. 
533% 
5.4. 


3.5 


5.6. 
Sry. 
5.8. 


52. 
5.10. 


SALLE 


Siz: 


(a) Prove that the transpositions (12), (23),...,(n—1,n) generate the symmetric 
group Sp. 


(b) How many transpositions are needed to write the cycle (123--- n)? 

(c) Prove that the cycles (12---m) and (12) generate the symmetric group Sy. 
What is the centralizer of the element (12) in S5? 

Determine the orders of the elements of the symmetric group $7. 


Describe the centralizer Z(o) of the permutation 0 = (153)(246) in the symmetric 
group 57, and compute the orders of Z(a) and of C(o). 


Let p and q be permutations. Prove that the products pg and qp have cycles of equal 
SIZES. 


Find all subgroups of S4 of order 4, and decide which ones are normal. 
Prove that A, is the only subgroup of S,, of index 2. 


‘Determine the integers n such that there is a surjective homomorphism from the 
symmetric group S,, to S,_1. 

Let q be a 3-cycle in S,. How many even permutations p are there such that pqgp™! = q? 
Verify formulas (7.5.2) and (7.5.3) for the class equations of $4 and Ss, and determine 
the centralizer of a representative element in each conjugacy class. 


(a) Let C be the conjugacy class of an even permutation p in S,,. Show that C is either 
a conjugacy class in Ay, or else the union of two conjugacy classes in A, of equal 
order. Explain how to decide which case occurs in terms of the centralizer of p. 


(b) Determine the class equations of A4 and As. 
(c) One may also decompose the conjugacy classes of permutations of odd order into 
A,n-orbits. Describe this decomposition. 


Determine the class equations of S¢ and Ag. 


Section 6 Normalizers 


6.1. 


6.2. 


*6.3. 


6.4. 


Prove that the subgroup B of invertible upper triangular matrices in G L,, (R) is conjugate 
to the subgroup L of invertible lower triangular matrices. 

Let B be the subgroup of G = GL, (C) of invertible upper triangular matrices, and 
let U C B be the set of upper triangular matrices with diagonal entries 1. Prove that 
B= N(U) and that B = N(B). 

Let P denote the subgroup of GL, (R) consisting of the permutation matrices. Determine 
the normalizer N(P). 

Let H be a normal subgroup of prime order p in a finite group G. Suppose that p 
is the smallest prime that divides the order of G. Prove that H is in the 
center Z(G). 


1 Suggested by Ivan Borsenko. 
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6.5. 


*6.6. 


Let p be a prime integer and let G be a p-group. Let H be a proper subgroup of G. 
Prove that the normalizer N(H) of H is strictly larger than H, and that H is contained 
in a normal subgroup of index p. 


Let H be a proper subgroup of a finite group G. Prove: 


(a) The group G is not the union of the conjugate subgroups of H. 
(b) There is a conjugacy class C that is disjoint from H. 


Section 7 The Sylow Theorems 


qe. 


Ven 


eos 
7.4, 


CPSs 
Ge 
es 


*7,8. 


795 
7.10. 


Let n = p®m, as in (4.5.1), and let N be the number of subsets of order p® in a set of 
order n. Determine the congruence class of N modulo p. 


Let G; C G2 be groups whose orders are divisible by p, and let H; be a Sylow p-subgroup 
of G,. Prove that there is a Sylow p-subgroup H2 of G2 such that Hy = H29 Gy}. 


How many elements of order 5 might be contained in a group of order 20? 


(a) Prove that no simple group has order pq, where p and gq are prime. 
(b) Prove that no simple group has order p*q, where p and q are prime. 


Find Sylow 2-subgroups of the following groups: (a) Dio, (b) 7, (¢) O, (d) TZ. 
Exhibit a subgroup of the symmetric group 57 that is a nonabelian group of order 21. 


Let n = pm be an integer that is divisible exactly once by p, and let G be a group of order 
n. Let H be a Sylow p-subgroup of G, and let S be the set of all Sylow p-subgroups. 
Explain how S decomposes into H-orbits. 


Compute the order of GL, (Fp). Find a Sylow p-subgroup of GL, (Fp), and determine 
the number of Sylow p-subgroups. 


Classify groups of order (a) 33, (b) 18, (¢) 20, (d) 30. 
Prove that the only simple groups of order <60 are the groups of prime order. 


Section 8 The Groups of Order 12 


8.1. 
8.2. 


$33. 
8.4. 
8.5. 


Which of the groups of order 12 described in Theorem 7.8.1 is isomorphic to $3 X C2? 


(a) Determine the smallest integer n such that the symmetric group S, contains a 
subgroup isomorphic to the group (7.8.2). 


(b) Find a subgroup of SZ2(Fs) that is isomorphic to that group. 
Determine the class equations of the groups of order 12. 
Prove that a group of order n = 2p, where p is prime, is either cyclic or dihedral. 


Let G be a nonabelian group of order 28 whose sylow 2 subgroups are cyclic. 


(a) Determine the numbers of sylow 2 - subgroups and of sylow 7 - subgroups. 
(b) Prove that there is at most one isomorphism class of such groups. 
(c) Determine the numbers of elements of each order, and the class equation of G. 


8.6. 
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Let G be a group of order 55. 


(a) Prove that G is generated by two elements x and y, with the relations x!! = 1, 
=1, yxy! =x", forsomer,1<r<11. 

(b) Decide which values of r are possible. 

(c) Prove that there are two isomorphism classes of groups of order 55. 


Section9 The Free Group 


bE I 


oz. 


Let F be the free group on {x, y}. Prove that the three elements u = x2, v = 7, and 
Z = xy generate a subgroup isomorphic to the free group on u, v, and z. 


We may define a closed word in S’ to be the oriented loop obtained by joining the ends 
of a word. Reading counterclockwise, 


a es 
bbd 


is a closed word. Establish a bijective correspondence between reduced closed words and 
conjugacy classes in the free group. 


Section 10 Generators and Relations 


10.1. 
10.2. 


10.3. 


10.4. 


10.5. 


10.6. 


10.7. 


Prove the mapping properties of free groups and of quotient groups. 

Let g: G - G’ be a surjective group homomorphism. Let S be a subset of G whose 
image g(S) generates G’, and let T be a set of generators of kerg. Prove that SUT 
generates G. 

Can every finite group G be presented by a finite set of generators and a finite set of 
relations? 

The group G = <x, y;xyx ly! is called a free abelian group. Prove a mapping 
property of this group: If wu and v are elements of an abelian group A, there is a unique 
homomorphism g:G —> A such that g(x) = u, p(y) = v. 

Prove that the group generated by x, y, z with the single relation yxyz~ 
a free group. 


2 = 1 is actually 


of G. 


(a) Prove that every characteristic subgroup is normal, and that the center Z is a 
characteristic subgroup. 

(b) Determine the normal subgroups and the characteristic subgroups of the quaternion 
group. 

The commutator subgroup C of a group G is the smallest subgroup that contains all 

commutators. Prove that the commutator subgroup is a characteristic subgroup (see 

Exercise 10.6), and that G/C is an abelian group. 
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10.8. 


10.9. 


10.10. 


Determine the commutator subgroups (Exercise 10.7) of the following groups: 
(a) SO2, (b) O2, (c) the group M of isometries of the plane, (d) Sy, (d) SO3. 
Let G denote the group of 3X3 upper triangular matrices with diagonal entries equal to 1 


and with entries in the field F,. For each prime p, determine the center, the commutator 
subgroup (Exercise 10.6), and the orders of the elements of G. 


Let F be the free group on x, y and let R be the smallest normal subgroup containing 
the commutator xyx ty"!. 

(a) Show that x2y2x-2y? is in R. 

(b) Prove that R is the commutator subgroup (Exercise 10.7) of F. 


Section 11 The Todd-Coxeter Algorithm 


11.1. 
12: 


Les: 


11.4. 


11.5. 


11.6. 


11.7. 


11.8. 


Complete the proof that the group given in Example 7.11.8 is cyclic of order 3. 


Use the Todd-Coxeter algorithm to show that the group defined by the relations (7.8.2) 
has order 12 and that the group defined by the relations (7.7.8) has order 21. 


Use the Todd-Coxeter Algorithm to analyze the group generated by two elements x, y, 
with the following relations. Determine the order of the group and identify the group if 
you can: 

Qt = =1,xayx=yxy, bio? = ee 
()x4=y=1,xyx=yxy, (d)x4=yl=xy =1, 

© =1iy (yO — ya 

Ox=1, 1, x7 = yee Sy, 

@xlyx=yl ylxry=x}, *j) y =1, x2 yxy =1. 

How is normality of a subgroup H of G reflected in the table that displays the operation 
on cosets? 


Let G be the group generated by elements x, y, with relations x*+ = 1, y> = 1, x* = yxy. 
Prove that this group is trivial in two ways: using the Todd-Coxeter Algorithm, and 
working directly with the relations. 

A triangle group GP? isa group< x, y, z|x?, y7, z”, xyz>, where p < q < rare positive 
integers. In each case, prove that the triangle group is isomorphic to the group listed. 

(a) the dihedral group D,, when p,q, r = 2,2, n, 

(b) the octahedral group, when p, q, r = 2, 3, 4, 

(c) the icosahedral group, when p, q, r = 2, 3,5. 


Let A denote an equilateral triangle, and let a, b, c denote the reflections of the plane 
about the three sides of A. Let x = ab, y = bc, z = ca. Prove that x, y, Z generate a 
triangle group (Exercise 11.6). 


(a) Prove that the group G generated by elements x, y, z with relations x? = y> = 2) = 
1, xyz = 1 has order 60. 


(b) Let H be the subgroup generated by x and zyz~!. Determine the permutation 
representation of G on G/H, and identify H. 


(c) Prove that G is isomorphic to the alternating group As. 
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(d) Let K be the subgroup of G generated by x and yxz. Determine the permutation 
representation of G on G/K, and identify K. 


Miscellaneous Problems 


M.1. 


M.2. 


"M.3. 


*M.4. 


M.5. 


M.6. 


M.7. 


*ML._8. 


M.9. 


Classify groups that are generated by two elements x and y of order 2. 
Hint: It will be convenient to make use of the element z = xy. 


With the presentation (6.4.3), determine the double cosets (see Exercise M.9) HgH of 


the subgroup H = {1, y} in the dihedral group D,,. Show that each double coset has 
either two or four elements. 


(a) Suppose that a group G operates transitively on a set S, and that H is the stabilizer 
of an element so of S. Consider the operation of G on S X S defined by g(s;, 52) = 
(g51, g52). Establish a bijective correspondence between double cosets of H in G 
and G-orbits in § x S. 


(b) Work out the correspondence explicitly for the case that G is the dihedral group Ds 
and S is the set of vertices of a pentagon. 


(c) Work it out for the case that G = T and that S is the set of edges of a tetrahedron. 


Let H and K be subgroups of a group G, with H C K. Suppose that H is normal in K, 
and that K is normal in G. Is H normal in G? 


Let H and N be subgroups of a group G, and assume that N is a normal subgroup. 


(a) Determine the kernels of the restrictions of the canonical homomorphism 7:G > 
G/N to the subgroups H and HN. 


(b) Applying First Isomorphism Theorem to these restrictions, prove the Second Iso- 
morphism Theorem: H/(H OQ N) is isomorphic to (HN)/N. 


Let H and N be normal subgroups of a group G such that H > N. Let H = H/N and 
Ge GN. 
(a) Prove that is a normal subgroup of G. 
(b) Use the composed homomorphism G — G —> G/H to prove the 

Third Isomorphism Theorem: G/H is isomorphic to G/H 
2Let p1, pz be permutations of the set S = {1, 2, ..., m}, and let U; be the subset of S of 
indices that are not fixed by p;. Prove: 
(a) If U; N U2 = @, the commutator p; p2 Dia ps is the identity. 
(b) If U,NU?2 contains exactly one element, the commutator pp? P; L Pea is a three-cycle. 
Let H be a subgroup of a group G. Prove that the number of left cosets is equal to the 
number of right cosets also when G is an infinite group. 


Let x be an element, not the identity, of a group of odd order. Prove that the elements x 
and x"! are not conjugate. 


2Suggested by Benedict Gross. 


228 Chapter 7 More Group Theory 


M.10. Let G be a finite group that operates transitively on a set S of order > 2. Show that G 
contains an element g that doesn’t fix any element of S. 


M.11. Determine the conjugacy classes of elements order 2 in GL2(Z). 
*M.12. (class equation of SL2) Many, though not all, conjugacy classes in SL2(F) contain 
-1 


matrices of the form A = 1 meal: 


(a) Determine the centralizers in SL2(Fs) of the matrices A, for a = 0, 1, 2, 3, 4. 
(b) Determine the class equation of SL2(Fs). 


(c) How many solutions of an equation of the form x7 + axy+ y? =1inF p might there 
be? To analyze this, one can begin by setting y = Ax + 1. For most values of A there 
will be two solutions, one of which is x = 0, y = 1. 


(d) Determine the class equation of SL2(F p). 
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Bilinear Forms 


| presume that to the uninitiated 
the formulae will appear cold and cheerless. 


—Benjamin Pierce 


8.1 BILINEAR FORMS , 


The dot product (X.Y) = X'Y = x,y, +---+Xnyn on R” was discussed in Chapter S. 
It is symmetric: (Y .X) = (X.Y), and positive definite: (X -X) > 0 for every X40. We 
examine several analogues of dot product in this chapter. The most important ones are 
symmetric.forms and Hermitian forms. All vector spaces in this chapter are assumed to be 
finlite-dimensional. 

Let V be a real vector space. A bilinear form on V is a real-valued function of two 
vector variables - a map V X V->R. Given a pair v, w of vectors, the form returns a real 
number that will usually be denoted by (v, w). A bilinear form is required to be linear in 
each variable: 


(8.1.1) (v1, W1) =7(vz, Wy) and (vy + v2, Wi) = (Vj, W1) + (V2, W4) 
(U1, 7W1) =r(vz, Wi) and (v4, Wy + W2) = (Vj, W1) + (V1, W2) 
for all v; and w; in V and all real numbers yr. Another way to say this is that the form is 
compatible with linear combinations in each variable: 
(8.1.2) (oxiv, w) = >oxi(v;, w) 
(v, wy yj) = Lv, wy)yj 
for all vectors vj and w; and all real numbers x; and yj. (It is often convenient to bring 


scalars in the second variable out to the right side.) 
The form on R” defined by 


(8.1.3) (x yi XAG 


where A is an n Xn matrix, is an example of a bilinear form. The dot product is the case 
A = 1, and when one is working with real column vectors, one always assumes that the form 
is dot product unless a different form has been specified. 
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If a basis B = (v1, ..., Un) of V is given, a bilinear form ( , ) can be related to a form 
of the type (8.1.3) by the matrix of the form. This matrix is simply A = (a;;), where 
(8.1.4) Ajj = (Vj, Vj). 
Proposition 8.1.5 Let (, ) be a bilinear form on a vector space V, let B= (v1,..., Un) bea 


basis of V, and let A be the matrix of the form with respect to that basis. If X and Y are the 
coordinate vectors of the vectors v and w, respectively, then 


(v, w) = XtAY. 


Proof. If v = BX and w = BY, then 


(v, w) = iy U;iX;j, > vj yj = > xilvi, vj ye > xiaijy; = X‘AY. gO 
; — — 


l, l, 
A bilinear form is symmetric if (v, w) = (w,v) for all v and w in V, and skew- 


symmetric if (v, w) = -(w, v) for all v and w in V. When we refer to a symmetric form, we 
mean a bilinear symmetric form, and similarly, reference to a skew-symmetric form implies 
bilinearity. 


Lemma 8.1.6 


(a) Let A be ann Xn matrix. The form X'AY is symmetric: X‘AY = Y'AX for all X and Y, 
if and only if the matrix A is symmetric: A’ = A. 


(b) A bilinear form (, ) is symmetric if and only if its matrix with respect to an arbitrary 
basis is a symmetric matrix. 


The analogous statements are true when the word symmetric is replaced by skew-symmetric. 
Proof. (a) Assume that A = (a;;) is a symmetric matrix. Thinking of X'AY as a 1X1 matrix, 
it is equal to its transpose. Then X‘tAY = (X'AY)' = Y'ALY = Y'AX. Thus the form is 


symmetric. To derive the other implication, we note that e;'Ae 'j = ajj, while e Ae; = aj. In 
order for the form to be symmetric, we must have aj; = a ji 


(b) This follows from (a) because (v, w) = XtAY. ae a 


The effect of a change of basis on the matrix of a form is determined in the usual way. 


Proposition 8.1.7 Let ( , ) be a bilinear form on a real vector space V, and let A and A’ be 
the matrices of the form with respect to two bases B and B’. If P is the matrix of change of 
basis, so that B’ = BP, then 


A’ = PAP. 


Proof. Let X and X’ be the coordinate vectors of a vector v with respect to the bases B and 
B’. Then v = BX = B’X’, and PX’ = X. With analogous notation, w = BY = BY’, 


(v, w) = X'AY = (PX’)'A(PY’) = X"(PYAP)Y’. 


This identifies P’AP as the matrix of the form with respect to the basis B’. [] 
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Corollary 8.1.8 Let A be the matrix of a bilinear form with respect to a basis. The matrices 
that represent the same form with respect to different bases are the matrices P'AP, where P 
can be any invertible matrix. O 


Note: There is an important observation to be made here. When a basis is given, both linear 
operators and bilinear forms are described by matrices. It may be tempting to think that 
the theories of linear operators and of bilinear forms are equivalent in some way. They are 
not equivalent. When one makes a change of basis, the matrix of the bilinear form X‘AY 
changes to P'AP, while the matrix of the linear operator Y = AX changes to P"!AP. The 
matrices obtained with respect to the new basis will most often be different. O 


8.2 SYMMETRIC FORMS 


Let V be a real vector space. A symmetric form on V is positive definire if (v, v) > 0 for all 
nonzero vectors v, and positive semi-definite if (v, v) > 0 for all nonzero vectors vu. Negative 
definite and negative semi-definite forms are defined analogously. Dot product is a symmetric, 
positive definite form on R”. 

A symmetric form that is neither positive nor negative semi-definite is called indefinite. 
The Lorentz form 


(8.2.1) (X, Y) = x1y1 + X22 + x33 — X44 


is an indefinite symmetric form on “‘space—time”’ R‘, where xz is the “‘time”’ coordinate, and 
the speed of light is normalized to 1. Its matrix with respect to the standard basis of R* is 


1 
(8.2.2) , 
gi 


As an introduction to the study of symmetric forms, we ask what happens to dot 
product when we change coordinates. The effect of the change of basis from the standard 
basis E to a new basis B’ is given by Proposition 8.1.7. If B’ = EP, the matrix J of dot product 
changes to A’ = P'IP = P'P, or in terms of the form, if PX’ = X and PY’ = Y, then 


(8.2.3) xty =x"A’y’, where A’=P'P. 


If the change of basis is orthogonal, then P'P is the identity matrix, and (X - Y) = (X"- Y’). 
But under a general change of basis, the formula for dot product changes as indicated. 

This raises a question: Which of the bilinear forms X'AY are equivalent to dot product, 
in the sense that they represent dot product with respect to some basis of R”? Formula 
(8.2.3) gives a theoretical answer: 


Corollary 8.2.4 The matrices A that represent a form (X,Y) = X Ay equivalent to dot 
product are those that can be written as a product P'P, for some invertible matrix P. O 


This answer won’t be satisfactory until we can decide which matrices A can be writ- 
ten as such a product. One condition that A must satisfy is very simple: It must be 
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symmetric, because P'P is always a symmetric matrix. Another condition comes from the 
fact that dot product is positive definite. 

In analogy with the terminology for symmetric forms, a symmetric real matrix A is 
called positive definite if X'AX > 0 for all nonzero column vectors X. If the form X‘AY is 
equivalent to dot product, the matrix A will be positive definite. 

The two conditions, symmetry and positive definiteness, characterize matrices that 
represent dot product. 


Theorem 8.2.5 The following properties of a real n Xn matrix A are equivalent: 


(i) The form X‘AY represents dot product, with respect to some basis of R”. 
(ii) There is an invertible matrix P such that A = P'P. 
(iii) The matrix A is symmetric and positive definite. 


We have seen that (i) and (ii) are equivalent (Corollary 8.2.4) and that (i) implies (iii). 
We will prove that (iii) implies (i) in Section 8.4 (see (8.4.18)). 


8.3. HERMITIAN FORMS 


The most useful way to extend the concept of symmetric forms to complex vector spaces is 
with Hermitian forms. A Hermitian form on a complex vector space V isa map VX V > C, 
denoted by (v, w), that is conjugate linear in the first variable, linear in the second variable, 
and Hermitian symmetric: 


(8.3.1) (CU,, Wy) = C(vj, Wi) and = (vy + U2, Wy) = (V1, Wi) + (V2, W1) 
(vj, CW) =C(vy, Wi) and (v1, W1+W2) = (v1, Wi) + (v1, W2) 
(wi,¥1) = (v4, wi) 
for all v; and w; in V, and all complex numbers c, where the overline denotes complex 


conjugation. As with bilinear forms (8.1.2), this condition can be expressed in terms of linear 
combinations in the variables: 


(8.3.2) ji (Er; w) yer vw) 
(v, wy yj) = Lv, wy) yj 


for any vectors v; and w; and any complex numbers x; and y;. Because of Hermitian 


symmetry, (v, v) = (v, v), and therefore (v, v) is a real number, for all vectors v. 
The standard Hermitian form on C” is the form 


(8.333) (X, Yp= 2X heey =a, 


where the notation X* stands for the conjugate transpose (X),..., Xn) of X = (x},...., May 
When working with C”, one always assumes that the form is the standard Hermitian form, 
unless another form has been specified. 

The reason that the complication caused by complex conjugation is introduced is that 
(X, X) becomes a positive real number for every nonzero complex vector X. If we use 
the bijective correspondence of complex n-dimensional vectors with real 2n-dimensional 
vectors, by 
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(8.3.4) en. %,)' —> (ay, Bi, .. Pansby)*, 
where x, = dy + byi, then X, = a, — byi and 
(X,X) = Myx +--+ +XnXn = ai t+ be+---4+a2 + be. 
Thus (X, X) is the square of the length of the corresponding real vector, a positive real 


number. 
For arbitrary vectors X and Y, the symmetry property of dot product is replaced by 


Hermitian symmetry: (Y, X) = (X,Y). Bear in mind that when X+Y, (X, Y) is likely to 
be a complex number, whereas dot product of the corresponding real vectors would be 
real. Though elements of C” correspond bijectively to elements of R2”, as above, these two 


vector spaces aren’t equivalent, because scalar multiplication by a complex number isn’t 
defined on R2”. 


The adjoint A* of acomplex matrix A = (aj;;) is the complex conjugate of the transpose 
matrix A‘, a notation that was used above for column vectors. So the i, j entry of A* is aji. 


1 eae ar 2 
For example, | 5 j =e i 


Here are some rules for computing with adjoint matrices: 

(8.3.5) (cA)* =GA*, (A+B)* =A*+B*, (AB)*=BtA*, A*=A. 
A square matrix A is Hermitian (or self-adjoint) if 

(8.3.6) A* =A. 


The entries of a Hermitian matrix A satisfy the relation aj; = @jj. Its diagonal entries are 
real and the entries below the diagonal are the complex conjugates of those above it: 


ry aij 
(8.3.7) a aoe Tek. ape. 


aij rn 
For example, E ‘| is a Hermitian matrix. A real matrix is Hermitian if and only if it is 
=! 


symmetric. 
The matrix of a Hermitian form with respect to a basis B = (v1, ..., Un) is defined as 
for bilinear forms. It is A = (ajj), where ajj = (Uj, Uj). The matrix of the standard Hermitian 


form on C” is the identity matrix. 


Proposition 8.3.8 Let A be the matrix of a Hermitian form ( , ) on a complex vector space 

V, with respect to a basis B. If X and Y are the coordinate vectors of the vectors v 

and w, respectively, then (v, w) = X*AY and A is a Hermitian matrix. Conversely, if A 

is a Hermitian matrix, then the form on C” defined by (X, Y) = X"AY is a Hermitian 
form. 


The proof is analogous to that of Proposition 8.1.5. O 
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Recall that if the form is Hermitian, (v, v) is a real number. A Hermitian form is 
positive definite if (v, v) is positive for every nonzero vector v, and a Hermitian matrix 
is positive definite if X*AX is positive for every nonzero complex column vector X. A 
Hermitian form is positive definite if and only if its matrix with respect to an arbitrary basis 
is positive definite. 

The rule for a change of basis B’ = BP in the matrix of a Hermitian form is determined, 
as usual, by substituting PX’ = X and PY’ = Y: 


X*AY = (PX')*A(PY’) =X" (P*AP)Y’. 
The matrix of the form with respect to the new basis is 
(8.3.9) A=? AR: 
Corollary 8.3.10 


(a) Let A be the matrix of a Hermitian form with respect to a basis. The matrices that 
represent the same form with respect to different bases are those of the form A’ = P* AP, 
where P can be any invertible complex matrix. 


(b) A change of basis B’ = EP in C” changes the standard Hermitian form X*Y to X’*A’Y’, 
where A’ = P*P. oO 


The next theorem gives the first of the many special properties of Hermitian matrices. 


Theorem 8.3.11 The eigenvalues, the trace, and the determinant of a Hermitian matrix A 
are real numbers. 


Proof. Since the trace and determinant can be expressed in terms of the eigenvalues, it 
suffices to show that the eigenvalues of a Hermitian matrix A are real. Let_X be an eigenvector 
of A with eigenvalue A. Then 


X*AX = X* (AX) = X*(AX) =AX*X. 
We note that (AX)* = AX*. Since A* = A, 
X*AX =(KA)X = COAX = ARP] OY) Xe x 


So AX*X = 7X*X. Since X*X is a positive real number, it is not zero. Therefore A = i, 
which means that A is real. es) 


Please go over this proof carefully. It is simple, but so tricky that it seems hard to trust. Here 
is a startling corollary: 


Corollary 8.3.12 The eigenvalues of a real symmetric matrix are real numbers. 


Proof. When a real symmetric matrix is regarded as a complex matrix, it is Hermitian, so 
the corollary follows from the theorem. O 


This corollary would be difficult to prove without going over to complex matrices, though it 
can be checked directly for a real symmetric 2 X 2 matrix. 
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A matrix P such that 
(8.3.13) PP=I, (or P= a 
is called a unitary matrix. A matrix P is unitary if and only if its columns P),..., Py are 
orthonormal with respect to the standard Hermitian form, i.e., if and only if PP, = 1 and 


v2 


The unitary matrices form a subgroup of the complex general linear group called the 
unitary group. It is denoted by U,,: 


PP, = 0 when i+ j. For example, the matrix -L ia “| is unitary. 


(8.3.14) Oya (P| P* P= 1}. 


We have seen that a change of basis in R” preserves dot product if and only if the 
change of basis matrix is orthogonal (5.1.14). Similarly, a change of basis in C” preserves 
the standard Hermitian form X*Y if and only if the change of basis matrix is unitary (see 
(8.3.10)(b)). 


8.4 ORTHOGONALITY 


In this section we describe, at the same time, symmetric (bilinear) forms on a real vector 
space and Hermitian forms on a complex vector space. Throughout the section, we assume 
that we are given either a finite-dimensional real vector space V with a symmetric form, 
or a finite-dimensional complex vector space V with a Hermitian form. We won’t assume 
that the given form is positive definite. Reference to a symmetric form indicates that V is a 
real vector space, while reference to a Hermitian form indicates that V is a complex vector 
space. Though everything we do applies to both cases, it may be best for you to think of a 
symmetric form on a real vector space when reading this for the first time. 

In order to include Hermitian forms, bars will have to be put over some symbols. Since 
complex conjugation is the identity operation on the real numbers, we can ignore bars when 
considering symmetric forms. Also, the adjoint of a real matrix is equal to its transpose. 
When a matrix A is real, A* is the transpose of A. 


We assume given a symmetric or Hermitian form on a finite-dimensional vector space 
V. The basic concept used to study the form is orthogonality. 


e Two vectors v and w are orthogonal (written vw) if 
(v, w) = 0. 


This extends the definition given before when the form is dot product. Note that v1 w if and 
only if wv. 

What orthogonality of real vectors means geometrically depends on the form and also 
on a basis. One peculiar thing is that, when the form is indefinite, a nonzero vector v may 
be self-orthogonal: (v, v) = 0. Rather than trying to understand the geometric meaning of 
orthogonality for each symmetric form, it is best to work algebraically with the definition of 
orthogonality, (v, w) = 0, and let it go at that. 
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If W is a subspace of V, we may restrict the form on V to W, which means simply that 
we take the same form but look at it only when the vectors are in W. It is obvious that if the 
form on V is symmetric, Hermitian, or positive definite, then its restriction to W will have 
the same property. 


e The orthogonal space to a subspace W of V, often denoted by W1, is the subspace of 
vectors v that are orthogonal to every vector in W, or symbolically, such that v1 W: 


(8.4.1) Wt ={ve V| {v, w) =0 for all win W}. 


¢ An orthogonal basis B = (v1,..., Un) of V is a basis whose vectors are mutually 
orthogonal: (v;, v;) = 0 for all indices i and j with i# j. The matrix of the form with respect 
to an orthogonal basis will be a diagonal matrix, and the form will be nondegenerate (see 
below) if and only if the diagonal entries (v;, vj) of the matrix are nonzero (see (8.4.4)(b)). 


e A null vector v in V is a vector orthogonal to every vector in V, and the nullspace N of 
the form is the set of null vectors. The nullspace can be described as the orthogonal space to 
the whole space V: 


N= | gaev ev 


e The form on V is nondegenerate if its nullspace is the zero space {0}. This means that 
for every nonzero vector v, there is a vector v’ such that (v, v’)#0. A form that isn’t 
nondegenerate is degenerate. The most interesting forms are nondegenerate. 


¢ The form on V is nondegenerate on a subspace W if its restriction to W is a nondegenerate 
form, which means that for every nonzero vector w in W, there is a vector w’, also in W, 
such that (w, w’) #0. A form may be degenerate on a subspace, though it is nondegenerate 
on the whole space, and vice versa. 


Lemma 8.4.2 The form is nondegenerate on W if and only if WN W+ = {0}. Es) 


There is an important criterion for equality of vectors in terms of a nondegenerate 
form. 


Proposition 8.4.3, Let ( , ) be a nondegenerate symmetric or Hermitian form on V, and let 
v and v’ be vectors in V. If (v, w) = (v’, w) for all vectors w in V, then v = v’. 


Proof. If (v, w) = (vu’, w), then uv — v’ is orthogonal to w. If this is true for all w in V, then 
v — v’ is a null vector, and because the form is nondegenerate, v—v’=0. O 


Proposition 8.4.4 Let (, ) be a symmetric form on a real vector space or a Hermitian form 
on a complex vector space, and let A be its matrix with respect to a basis. 


(a) A vector v is a null vector if and only if its coordinate vector Y solves the homogeneous 
equation AY = 0. 


(b) The form is nondegenerate if and only if the matrix A is invertible. 


Proof. Via the basis, the form corresponds to the form X*A Y, so we may as well work with 
that form. If Y is a vector such that AY = 0, then X*AY = 0 for all X, which means that Y 
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is orthogonal to every vector, i.e., it is a null vector. Conversely, if AY #0, then AY has a 
nonzero coordinate. The matrix product e*AY picks out the ith coordinate of AY. So one of 
those products is not zero, and therefore Y is not a null vector. This proves (a). Because A 1s 
invertible if and only if the equation AY = 0 has no nontrivial solution, (b) follows. O 


Theorem 8.4.5 Let ( , ) be a symmetric form on a real vector space V or a Hermitian form 
on a complex vector space V, and let W be a subspace of V. 


(a) The form is nondegenerate on W if and only if V is the direct sum W ® W-. 
(b) If the form is nondegenerate on V and on W, then it is nondegenerate on Wt. 


When a vector space V is a direct sum W, ® --- ® W; and W,; is orthogonal to W; for 
i# Jj, V is said to be the orthogonal sum of the subspaces. The theorem asserts that if the 
form is nondegenerate on W, then V is the orthogonal sum of W and W+. 


Proof of Theorem 8.4.5. (a) The conditions for a direct sum are WM Wt = {0} and 
V = W+ W+ (3.6.6)(c). The first condition simply restates the hypothesis that the form 
be nondegenerate on the subspace. So if V is the direct sum, the form is nondegenerate. 
We must show that if the form is nondegenerate on W, then every vector v in V can be 
expressed as asum v = w +u, with w in W and u in Wt. 


Weextend a basisr(w), 8, we) of Witton basin B —w), ..., Wz: Vi, s2agitzey)ol 

V, and we write the matrix of the form with respect to this basis in block form 
A B 

(8.4.6) = iG “| : 

where A is the upper left k x k submatrix. 

The entries of the block A are (w;, w;) fori, 7 =1,... ,k, so A is the matrix of t. - 
form restricted to W. Since the form is nondegenerate on W, A is invertible. The entries of 
the block B are (w;, vj) fori=1,...,kand y=l,...,n—- k. If we can choose the vectors 
Uj, ..., U,—% SO that B becomes zero, those vectors will be orthogonal to the basis of W, 


so they will be in the orthogonal space W ‘. Then since B is a basis of V, it will follow that 
V = W+ W-, which is what we want to show. 


To achieve B = 0, we change basis using a matrix with a block form 


(8.4.7) _— rai al 


where the block Q remains to be determined. The new basis B’ = BP will have the form 
ig, . oes, WEN, ~The basisiof W will not change. The matrix of the form with 


respect to the new basis will be 


(8.4.8) M=PMP=| 6. ale > a = é -_: 


We don’t need to compute the other entries. When we set Q =-A‘'B, the upper right block 
of M’ becomes zero, as desired. 
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(b) Suppose that the form is nondegenerate on V and on W. (a) shows that V=WOwW!. 

If we choose a basis for V by appending bases for W and W~, the matrix of the form on V 
will be a diagonal block matrix, where the blocks are the matrices of the form restricted to 
W and to W-. The matrix of the form on V is invertible (8.4.4), so the blocks are invertible. 
It follows that the form is nondegenerate on W".. O 


Lemma 8.4.9 If a symmetric or Hermitian form is not identically zero, there is a vector v in 
V such that (v, v) 40. 


Proof. If the form is not identically zero, there will be vectors x and y such that (x, y) is not 
zero. If the form is Hermitian, we replace y by cy where c is a nonzero complex number, to 
make (x, y) real and still not zero. Then (y, x) = (x, y). We expand: 


(MAY, Xe VR X, X) = 2G. y) aay). 


Since the term 2(x, y) isn’t zero, at least one of the three other terms in the equation isn’t 
zero. a 


Theorem 8.4.10 Let (, ) be asymmetric form on a real vector space V or a Hermitian form 
on a complex vector space V. There exists an orthogonal basis for V. 


Proof. Case 1: The form is identically zero. Then every basis is orthogonal. 


Case 2: The form is not identically zero. By induction on dimension, we may assume that 
there is an orthogonal basis for the restriction of the form to any proper subspace of V. 
We apply Lemma 8.4.9 and choose a vector v, with (v,, vj) #0 as the first vector in our 
basis. Let W be the span of (v,). The matrix of the form restricted to W is the 1 X 1 matrix 
whose entry is (vj, v;). It is an invertible matrix, so the form is nondegenerate on W. By 
Theorem 8.4.5, V = W ® W!. By our induction assumption, W+ has an orthogonal basis, 
Say (U2,..., Un). Then (v4, v2, ..., U,) will be an orthogonal basis of V. O 


Orthogonal Projection 


Suppose that our given form is nondegenerate on a subspace W. Theorem 8.4.5 tells us that 
V is the direct sum W ® W-. Every vector v in V can be written uniquely in the form 
v= w-+u, with w in W and u in Wt. The orthogonal projection from V to W is the map 
1: V — W defined by (v) = w. The decomposition v = w + u is compatible with sums of 
vectors and with scalar multiplication, so z is a linear transformation. 

The orthogonal projection is the unique linear transformation from V to W such that 
m(w) = wif wisin W and z(u) = 0 ifuisin W-. 


Note: If the form is degenerate on a subspace W, the orthogonal projection to W doesn’t 
exist. The reason is that WM W+ i contain a nonzero element x, and it will be impossible 
to have both r(x) = x and w(x) = i) 


The next theorem provides a very important formula for orthogonal projection. 
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Theorem 8.4.11 Projection Formula. Let ( , ) be asymmetric form on a real vector space V 
or a Hermitian form on a complex vector space V, and let W be a subspace of V on which 
the form is nondegenerate. If (w,,..., wx) is an orthogonal basis for W, the orthogonal 
projection 2 : V — Wis given by the formula m(v) = wyc; +--- + WCx, where 


___ (Wi, Vv) 
(wi, wi)’ 


Proof. Because the form is nondegenerate on W and its matrix with respect to an orthogonal 
basis is diagonal, (w;, w;) #0. The formula makes sense. Given a vector v, let w denote the 
vector WiC, + --- + WC, with cj as above. This is an element of W, so if we show that 
v—w=uisin W-, it will follow that z(v) = w, as the theorem asserts. To show that wu is 
in W+, we show that (w;, uv) = 0 fori =1,...,k. We remember that (w;, wj) = 0 if TF 7. 
Then 


(Wi, U) = (Wi, V) — (W;, W) = (wi, V) — ((wWi, Wi)e1 +--+ + (Wi, We)CK) 
= (Wi, V) — (Wj, Wi)ci = 0. oO 


Warning: This projection formula is not correct unless the basis is orthogonal. 


Example 8.4.12 Let V be the space IR? of column vectors, and let (v, w) denote the dot 
product form. Let W be the subspace spanned by the vector w; whose coordinate vector is 
(1, 1, 1)’. Let (1, x2, x3)’ be the coordinate vector of a vector v. Then (w1, v) = «1 +X2+%3. 
The projection formula reads m(v) = wc, where c = (x1 + x2 + x3)/3. O 


If a form is nondegenerate on the whole space V, the orthogonal projection from V to 
V will be the identity map. The projection formula is interesting in this case too, because it 
can be used to compute the coordinates of a vector v with respect to an orthogonal basis. 


Corollary 8.4.13 Let (, ) be a nondegenerate symmetric form on a real vector space V 
or a nondegenerate Hermitian form on a complex vector space V, let (v1, ..., Un) be an 
orthogonal basis for V, and let v be any vector. Then v = vic] +--+: + Unn, where 


_ (vj, V) 


: Oo 
(Uj, Ui) 


i 


Example 8.4.14 Let B = (v| . V2, V3) be the orthogonal basis of R? whose coordinate vectors 
are 


1 1 1 
eh (Hl ee ot 
1 0 = 


Let v be a vector with coordinate vector (x;, x2, x3)’. Then v = vic, + v2€2 + v3¢3 and 


C1 = (41 +2 +.03)/3, Co = (1 — X2)/2, 3 = C1 + 2 — 2x3) /6. Cs 


Next, we consider scaling of the vectors that make up an orthogonal basis. 
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Corollary 8.4.15 Let (, ) be a symmetric form on a real vector space V or a Hermitian form 
on a complex vector space V. 


(a) There is an orthogonal basis B = (v),..., Un) for V with the property that for each i, 
(U7, U;) equal to 1, —1, or UV: 

(b) Matrix form: If A is a real symmetric n Xn matrix, there is an invertible real matrix P 
such that P'AP is a diagonal matrix, each of whose diagonal entries is 1, -1, or 0. If A 
is a complex Hermitian m Xn matrix, there is an invertible complex matrix P such that 
P* AP is a diagonal matrix, each of whose diagonal entries is 1, -1, or 0. 


Proof. (a) Let (v1, ..., Un) be an orthogonal basis. If v is a vector, then for any nonzero 
real number c, (cv, cv) = c?(v, v), and c? can be any positive real number. So if we multiply 
v; by a scalar, we can adjust the real number (v;, v;) by an arbitrary positive real number. 
This proves (a). Part (b) follows in the usual way, by applying (a) to the form X*AY. O 


If we arrange an orthogonal basis that has been scaled suitably, the matrix of the form 
will have a block decomposition 


(8.4.16) Ay op 


where p,m, and z are the numbers of 1’s, -1’s, and 0’s on the diagonal, and p+m+4+z=n. 
The form is nondegenerate if and only if z = 0. 

If the form is nondegenerate, the pair of integers (p, m) is called the signature of the 
form. Sylvester’s Law (see Exercise 4.21) asserts that the signature does not depend on the 
choice of the orthogonal basis. 

The notation Ip, m is often used to denote the diagonal matrix 


I 
(8.4.17) lpm = | ? i 


With this notation, the matrix (8.2.2) that represents the Lorentz form is /3 . 

The form is positive definite if and only if m and z are both zero. Then the normalized 
basis has the property that (v;, v;) = 1 for each i, and (vj, vj) = 0 when i+ j. This is called 
an orthonormal basis, in agreement with the terminology introduced before, for bases of R” 
(5.1.8). An orthonormal basis B refers the form back to dot product on R” or to the standard 
Hermitian form on C”. That is, if v = BX and w = BY, then (v, w) = X*Y. An orthonormal 
basis exists if and only if the form is positive definite. 


Note: If B is an orthonormal basis for a subspace W of V, the projection from V to W is given 
by the formula 7(v) = wic, +--- + weg, Where cj = (wj;, v). The projection formula is 
simpler because the denominators (w;, w;) in (8.4.11) are equal to 1. However, normalizing 
the vectors requires extracting a square root, and because of this, it is sometimes preferable 
to work with an orthogonal basis without normalizing. O 


The proof of the remaining implication (iii) > (i) of Theorem 8.2.5 follows from this 
discussion: 
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Corollary 8.4.18 If a real matrix A is symmetric and positive definite, then the form X'AY 
represents dot product with respect to some basis of R”. 


When a positive definite symmetric or Hermitian form is given, the projection formula 
provides an inductive method, called the Gram-Schmiadt procedure, to produce an orthonor- 


mal basis, starting with an arbitrary basis (v;, ..., v,). The procedure is as follows: Let V;, 
denote the space spanned by the basis vectors (v1, ..., ug). Suppose that, for some k < n, 
we have found an orthonormal basis (w, ..., wx) for Vz_,. Let 2 denote the orthogonal 


projection from V to Vy_;. Then m(v,) = wc, +--+ + Wy. (Cr_1, Wiele C=" Ur, 
and wy = vz — (vx) is orthogonal to V,_,. When we normalize (wW,., tz) to 1, themset 
(w1, ..., wx) will be an orthonormal basis for V;z. , O 


The last topic of this section is a criterion for a symmetric form to be positive definite 
in terms of its matrix with respect to an arbitrary basis. Let A = (a; j) be the matrix of a 
symmetric form with respect to a basis B = (v),..., U,) of V, and let Az denote the k Xk 
minor made up of the matrix entries a;; with i, j < k: 


a a 
Aj =[ay], A =| - Pllolibge 


Theorem 8.4.19 The form and the matrix are positive definite if and only if det A, > 0 for 
kee doen or: 
We leave the proof as an exercise. O 


For example, the matrix A = ki ; is positive definite, because det [2] and det A are 


both positive. 


8.5 EUCLIDEAN SPACES AND HERMITIAN SPACES 


When we work in R”, we may wish to change the basis. But if our problem involves dot 
products — if length or orthogonality of vectors is involved — a change to an arbitrary 
new basis may be undesirable, because it will not preserve length and orthogonality. It 
is best to restrict oneself to orthonormal bases, so that dot products are preserved. The 
concept of a Euclidean space provides us with a framework in which to do this. A real 
vector space together with a positive definite symmetric form is called a Euclidean space, 
and a complex vector space together with a positive definite Hermitian form is called a 
Hermitian space. 

The space R”, with dot product, is the standard Euclidean space. An orthonormal 
basis for any Euclidean space will refer the space back to the standard Euclidean space. 
Similarly, the standard Hermitian form (X, Y) = X*Y makes C” into the standard Hermitian 
space, and an orthonormal basis for any Hermitian space will refer the form back to the 
standard Hermitian space. The only significant difference between an arbitrary Euclidean 
or Hermitian space and the standard Euclidean or Hermitian space is that no orthonormal 
basis is preferred. Nevertheless, when working in such spaces we always use orthonormal 
bases, though none have been picked out for us. A change of orthonormal bases will be 
given by a matrix that is orthogonal or unitary, according to the case. 
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Corollary 8.5.1 Let V be a Euclidean or a Hermitian space, with positive definite form 
(,), and let W be a subspace of V. The form is nondegenerate on W, and therefore 
V=Wew.. 


Proof. If w is a nonzero vector in W, then (w, w) is a positive real number. It is not zero, 
and therefore w is not a null vector in V or in W. The nullspaces are zero. O 


What we have learned about symmetric forms allows us to interpret the length of a 
vector and the angle between two vectors v and w in a Euclidean space V. Let’s set aside the 
special case that these vectors are dependent, and assume that they span a two-dimensional 
subspace W. When we restrict the form, W becomes a Euclidean space of dimension 2. 
So W has an orthonormal basis (w1, w2), and via this basis, the vectors v and w will 
have coordinate vectors in R*. We’ll denote these two-dimensional coordinate vectors by 
lowercase letters x and y. They aren’t the coordinate vectors that we would obtain using an 
orthonormal basis for the whole space V, but we will have (v, w) = x'y, and this allows us 
to interpret geometric properties of the form in terms of dot product in R. 

The length |\v| of a vector v is defined by the formula |u|? = (v, v). If x is the coordinate 
vector of v in R?, then |v|? = x'x. The law of cosines (x - y) = |x||y| cos @ in R? becomes 


(8.5.2) (v, w) = |v||w| cos 8, 


where @ is the angle between x and y. Since this formula expresses cos 6 in terms of the form, 
it defines the unoriented angle 6 between vectors v and w. But the ambiguity of sign in the 
angle that arises because cos @ = cos (-0) can’t be eliminated. When one views a plane in R° 
from its front and its back, the angles one sees differ by sign. 


8.6 THE SPECTRAL THEOREM 


In this section, we analyze certain linear operators on a Hermitian space. 

Let T: V — V bea linear operator on a Hermitian space V, and let A be the matrix of 
T with respect to an orthonormal basis B. The adjoint operator T*: V > V is the operator 
whose matrix with respect to the same basis is the adjoint matrix A*. 

If we change to a new orthonormal basis B’, the basechange matrix P will be unitary, 
and the new matrix of T will have the form A’ = P*AP = P™!AP. Its adjoint will be 
A™ = P*A*P. This is the matrix of 7* with respect to the new basis. So the definition of T* 
makes sense: It is independent of the orthonormal basis. 

The rules (8.3.5) for computing with adjoint matrices carry over to adjoint operators: 


(8.6.1) (T+uUy=T"4+ UU", (UY =j=UT, TT. 


A normal matrix is a complex matrix A that commutes with its adjoint: A*A = AA*. 
In itself, this isn’t a particularly important class of matrices, but is the natural class for which 
to state the Spectral Theorem that we prove in this section, and it includes two important 
classes: Hermitian matrices (A* = A) and unitary matrices (A* = A7!). 


Lemma 8.6.2 Let A be a complex nm Xn matrix and let P be ann Xn waist matrix. If A is 
normal, Hermitian, or unitary, so is P*AP. : 0 


A linear operator T on a Hermitian space is called normal, Hermitian, or unitary 
if its matrix with respect to an orthonormal basis has the same property. So T is normal 
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if 7*T = TT*, Hermitian if T* = T, and unitary if T*T = J. A Hermitian operator is 
sometimes called a self-adjoint operator, but we won’t use that terminology. 
The next proposition interprets these conditions in terms of the form. 


Proposition 8.6.3 Let T be a linear operator on a Hermitian space V, and let T* be the 
adjoint operator. 


(a) For all v and w in V, (Tv, w) = (v, T*w) and (v, Tw) w) 


= ( 
(b) 7 is normal if and only if, for all v and w in V, (Tv, T wv) = pon Tw). 
(c) 7 is Hermitian if and only if, for all v and w in V, (Tv, w) = (v, Tw). 
(d) 7 is unitary if and only if, for all v and w in V, (Tv, Tw) = (v, w). 


Proof. (a) Let A be the matrix of the operator T with respect to an orthonormal basis B. 
With v = BX and w = BY as usual, (Tv, w) = (AX)*Y = X*A*Y and (v, T*w) = X*A*Y. 
Therefore (Tv, w) = (v, T*w). The proof of the other formuia of (a) is similar. 


(b) We substitute 7*v for v into the first equation of (a): (TT*v, w) = (T*v, T* w). Similarly, 
substituting 7v for v into the second equation of (a): (Tu, Tw) = (T*Tv, w). So if T is 
normal, then (Tu, Tw) = (T*v, T*w). The converse follows by applying Proposition 8.4.3 
to the two vectors 7*7v and TT*v. The proofs of (c) and (d) are similar. O 


Let T be a linear operator on a Hermitian space V. As before, a subspace W of V is 
T-invariant if TW C W. A linear operator T will restrict to a linear operator on a 7-invariant 
subspace, and if T is normal, Hermitian, or unitary, the restricted operator will have the 
same property. This follows from Proposition 8.6.3. 


Proposition 8.6.4 Let 7 be a linear operator on a Hermitian space V and let W be a subspace 
of V. If W is T-invariant, then the orthogonal space W+ is T*-invariant. If W is T*-invariant 
then W* is T-invariant. 


Proof. Suppose that W is T-invariant. To show that W* is T*-invariant, we must show that 
if u isin W+, then T*u is also in W+, which by definition of W+ means that (w, T*u) = 0 
for all w in W. By Proposition 8.6.3, (w, T*u) = (Tw, u). Since W is T-invariant, Tw is in 
W. Then since u is in W, (Tw, u) = 0. So (w, T*u) = 0, as required. Since T** = T, one 
obtains the second assertion by interchanging the roles of T and 7”. Oo 


The next theorem is the main place that we use the hypothesis that the form given on 
V be positive definite. 


Theorem 8.6.5 Let T be a normal operator on a Hermitian space V, and let v be an 
eigenvector of T with eigenvalue A. Then v is also an eigenvector of 7*, with eigenvalue 2. 


Proof. Case 1: } = 0. Then Tv = 0, and we must show that T*v = 0. Since the form is 
positive definite, it suffices to show that (T*v, Tv) = 0. By Proposition 8.6.3, (7*v, T*v) = 
(Tila) een (0.0) satel: 

Case 2: 2 is arbitrary. Let S denote the linear operator T — AJ. Then vis an eigenvector for 
S with eigenvalue zero: Sv = 0. Moreover, S* = T* — AJ. You can check that S is a normal 
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operator. By Case 1, v is an eigenvector for S* with eigenvalue 0: S*v = T*v — Av = 0. This 
shows that v is an eigenvector of 7* with eigenvalue A. C 


Theorem 8.6.6 Spectral Theorem for Normal Operators 
(a) Let T be a normal operator on a Hermitian space V. There is an orthonormal basis of 
V consisting of eigenvectors for 7. 


(b) Matrix form: Let A be a normal matrix. There is a unitary matrix P such that P*A P is 
diagonal. 


Proof. (a) We choose an eigenvector v; for 7, and normalize its length to 1. Theorem 8.6.5 
tells us that v, is also an eigenvector for 7*. Therefore the one-dimensional subspace W 
spanned by v; is 7*-invariant. By Proposition 8.6.4, W+ is T-invariant. We also know that 
V =W6 W.-. The restriction of T to any invariant subspace, including W“, is a normal 
operator. By induction on dimension, we may assume that W~ has an orthonormal basis of 
eigenvectors, say (U2,..., Un). Adding v; to this set yields an orthonormal basis of V of 
eigenvectors for T. 


(b) This is proved from (a) in the usual way. We regard A as the matrix of the normal 
operator of multiplication by A on C”. By (a) there is an orthonormal basis B consisting of 
eigenvectors. The matrix P of change of basis from E to B is unitary, and the matrix of the 
operator with respect to the new basis, which is P* AP, is diagonal. O 


The next corollaries are obtained by applying the Spectral Theorem to the two most 
important types of normal matrices. 
Corollary 8.6.7 Spectral Theorem for Hermitian Operators. 
(a) Let 7 be a Hermitian operator on a Hermitian space V. 


(i) There is an orthonormal basis of V consisting of eigenvectors of T. 
(ii) The eigenvalues of T are real numbers. 


(b) Matrix form: Let A be a Hermitian matrix. 
(i) There is a unitary matrix P such that P* A P is a real diagonal matrix. 


(ii) The eigenvalues of A are real numbers. 


Proof. Part (b)(ii) has been proved before (Theorem 8.3.11) and (a)(i) follows from the 
Spectral Theorem for normal operators. The other assertions are variants. O 


Corollary 8.6.8 Spectral Theorem for Unitary Matrices. 


(a) Let A be a unitary matrix. There is a unitary matrix P such that P*AP is diagonal. 
(b) Every conjugacy class in the unitary group U,, contains a diagonal matrix. O 


To diagonalize a Hermitian matrix M, one can proceed by determining its eigen- 
vectors. If the eigenvalues are distinct, the corresponding eigenvectors will be orthogonal, 
and one can normalize their lengths to 1. This follows from the Spectral Theorem. For 
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1 1 ; ] 
example, vj’ = | and'v, = | are eigenvectors of the Hermitian matrix M@ = 2 : | 


-i 2 
with eigenvalues 3 and 1, respectively. We normalize their lengths to 1 by the factor 1/V2, 
obtaining the unitary matrix P = —L a : . Phen MP = 2 

v2|-i i Li 


However, the Spectral Theorem asserts that a Hermitian matrix can be diagonalized even 
when its eigenvalues aren’t distinct. For instance, the only 2X 2 Hermitian matrix whose 
characteristic polynomial has a double root A is AJ. 


What we have proved for Hermitian matrices has analogues for real symmetric 
matrices. A symmetric operator T on a Euclidean space V is a linear operator whose matrix 
with respect to an orthonormal basis is symmetric. Simiiarly, an orthogonal operator T ona 
Euclidean space V is a linear operator whose matrix with respect to an orthonormal basis is 
orthogonal. 

Proposition 8.6.9 Let T be a linear operator on a Euclidean space V. 


(a) T is symmetric if and only if, for all v and w in V, (Tv. w) = (v, Tw). 
(b) 7 is orthogonal if and only if, for all v and w in V, (Tv, Tw) = (v, w). O 
Theorem 8.6.10 Spectral Theorem for Symmetric Operators. 


(a) Let T be a symmetric operator on a Euclidean space V. 


(i) There is an orthonormal basis of V consisting of eigenvectors of T. 
(ii) The eigenvalues of T are real numbers. 


(b) Matrix form: Let A be a real symmetric matrix. 
(i) There is an orthogonal matrix P such that P'A P is a real diagonal matrix. 


(ii) The eigenvalues of A are real numbers. 


Proof. We have noted (b)(ii) before (Corollary 8.3.12), and (a)(ii) follows. Knowing this, 
the proof of (a)(i) follows the pattern of the proof of Theorem 8.6.6. 0 


The Spectral Theorem is a powerful tool. When faced with a Hermitian operator or a 
Hermitian matrix, it should be an automatic response to apply that theorem. 


8.7. CONICS AND QUADRICS 


Ellipses, hyperbolas, and parabolas are called conics. They are loci in R? defined by quadratic 
equations f = 0, where 


(8.7.1) Few) = ay1x4 + 2a42x1XxX2 + 2x5 + bx, + box2+0¢, 


and the coefficients a;;, bj, and c are real numbers. (The reason that the coefficient of x} x2 
is written as 2a} will be explained presently.) If the locus femOota quadratic equation is 
not a conic, we call it a degenerate conic. A degenerate conic can be a pair of lines, a single 
line, a point, or empty, depending on the equation. To emphasize that a particular locus is 
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not degenerate, we may sometimes refer to it as a nondegenerate conic. The term quadric is 
used to designate an analogous locus in three or more dimensions. 

We propose to describe the orbits of the conics under the action of the group of 
isometries of the plane. Two nondegenerate conics are in the same orbit if and only if they 
are congruent geometric figures. 

The quadratic part of the polynomial f(x, x2) is called a quadratic form: 


(8.7.2) Gos xo) ayx4 + 2a42xX1X2 + a7X5. 


A quadratic form in any number of variables is a polynomial, each of whose terms has 
degree 2 in the variables. It is convenient to express the quadratic form qg in matrix notation. 
To do this, we introduce the symmetric matrix 


(8.7.3) ale aa 
mt a2 422 


Then if X = (x1, x2)', the quadratic form can be written as q(x, x2) = X'AX. We put 
the coefficient 2 into Formulas 8.7.1 and 8.7.2 in order to avoid some coefficients 4 in this 
matrix. If we also introduce the 1 X 2 matrix B = [b; b2], the equation f = 0 can be written 
compactly in matrix notation as 


(8.7.4) X'AX +BX+c=0. 


Theorem 8.7.5 Every nondegenerate conic is congruent to one of the following loci, where 
the coefficients a; and a2 are positive: 


Ellipse: a\xt+ax3 -1 . =0, 
Hyperbola: a,x? — a2Xx3 = =0, 
Parabola: ax? —-x, =0. 


The coefficients a;; and a27 are determined by the congruence class of the conic, except that 
they can be interchanged in the equation of an ellipse. 


Proof. We simplify the equation (8.7.4) in two steps, first applying an orthogonal transfor- 
mation to diagonalize the matrix A and then applying a translation to eliminate the linear 
terms and the constant term when possible. 


The Spectral Theorem for symmetric operators (8.6.10) asserts that there is a 2 x2 
orthogonal matrix P such that P'AP is diagonal. We make the change of variable PX’ = X, 
and substitute into (8.7.4): 


(8.7.6) XA'X' + BoE =0 


where A’ = P'AP and B’ = BP. With this orthogonal change of variable, the quadratic form 
becomes diagonal, that is, the coefficient of MaNe is zero. We drop the primes. When the 
quadratic form is diagonal, f has the form 


F(%1, X2) = ayy xt + a29x5 + by x1 + box2 +. 
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To continue, we eliminate b;, by “‘completing squares” with the substitutions 


(8.7.7) x7 = (x - -). 


1 


This substitution corresponds to a translation of coordinates. Dropping primes again, f 
becomes 


(8.7.8) f (x1, x2) = ax} +anx3+c=0, 


where the constant term c has changed. The new constant can be computed when needed. 
When it is zero, the locus is degenerate. Assuming that c#0, we can multiply f by a scalar 
to change c to -1. If a;; are both negative, the locus is empty, hence degenerate. So at least 
one of the coefficients is positive, and we may assume that a1; > 0. Then we are left with the 
equations of the ellipses and the hyperbolas in the statement of the theorem. 


The parabola arises because the substitution made to eliminate the linear coefficient 
bj requires aj; to be nonzero. Since the equation f is supposed to be quadratic, these 
coefficients aren’t both zero, and we may assume a}; +0. lf a22 = 0 but b2 <0, we eliminate 
b,; and use the substitution 


(8.7.9) x2= X5 = c/b 


to eliminate the constant term. Adjusting f by a scalar factor and eliminating degenerate 
cases leaves us with the equation of the parabola. 


Example 8.7.10 Let f be the quadratic polynomial ne + 2x1x2 —- ws + 2x1 + 2x2 — 1. Then 


Do 
= = =i, 
A F mu Baz{2 2), .and ic 


The eigenvalues of A are + J/2. Setting a = J/2 —1and b = V2 +1, the vectors 


al i! Bigs =i 
Oh = a|’ 2S b 
are eigenvectors with eigenvalues /2 and - V2, respectively. They are orthogonal, and when 


we normalize their lengths to 1, they will form an orthonormal basis B such that [B]"! 4[B] 
is diagonal. Unfortunately, the square length of v; is 4 — 2./2. To normalize its length to 1, 
we must divide by V4 — 2/2. It is unpleasant to continue this computation by hand. 

If a quadratic equation f(x, x2) = 0 is given, we can determine the type of conic that 
it represents most simply by allowing arbitrary changes of basis, not necessarily orthogonal 
ones. A nonorthogonal change will distort an ellipse but it will not change an ellipse into a 
hyperbola, a parabola, or a degenerate conic. If we wish only to identify the type of conic, 
arbitrary changes of basis are permissible. 

We proceed as in (8.7.6), but with a nonorthogonal change of basis: 


ref) are IE aE! a} eee 
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Dropping primes, the new equation becomes xe = 20d +2x,; —1=0, and completing the 
square yields wet — Dvd —2=0, a hyperbola. So the original locus is a hyperbola too. 

By the way, the matrix A is positive or negative definite in the equation of an ellipse 
and indefinite in the equation of a hyperbola. The matrix A shown above is indefinite. We 
could have seen right away that the locus we have just inspected was either a hyperbola or a 
degenerate conic. 


The method used to describe conics can be applied to classify quadrics in any dimension. 
The general quadratic equation has the form f = 0, where 


(8.7.11) CC. o> <5 ea aa. + s 2Ajj;XjXj + > dix +c. 
‘ 


[ray i 


Let matrices A and B be defined by 


ay Ain 
A= 3 , B=([b bn] 
a ang 
Then 
(8.7.12) f(%1,...,%n) = X'AX+BX +e. 


The associated quadratic form is 
(8.7.13) GA, «..9%n) =A. 


According to the Spectral Theorem for symmetric operators, the matrix A can be diagonalized 
by an orthogonal transformation P. When A is diagonal, the linear terms and the constant 
term may be eliminated, so far as possible, as above. Here is the classification in three 
variables: 


Theorem 8.7.14 The congruence classes of nondegenerate quadrics in R* are represented 
by the following loci, in which a; are positive real numbers: 
Ellipsoids: a xi +anx5+a3x3-1 =0 
One-sheeted hyperboloids: a,x? + anx3 — 3x3 —-1 =0, 
Two-sheeted hyperboloids: a,x? - ax}, ~ a3x5 —1 ; 
Elliptic paraboloids: a,x? + anx?, — 273° ="0; 


Hyperbolic paraboloids: a,x} —anxz—x3 =0. QO 


A word is in order about the case that B and ¢ are zero in the quadratic polynomial 
f(%1, X2, x3) (8.7.12), Le., that f is equal to its quadratic form g (8.7.13). The locus {gq = 0} 
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is considered degenerate, but is interesting. Let’s call it Q. Since all of the terms a; jx; x; that 
appear in q have degree 2, 


(8.7.15) q(Ax1, Ax2, Ax3) = A2q(x1, x2, X3). 


for any real number 2. Consequently, if a pointexs40 lies on QO, e:, ig) = 0, tem 
q(AX) = 0 too, so AX lies on Q for every real number A. Therefore Q is a union of lines 
through the origin, a double cone. 

For example, suppose that q is the diagonal quadratic form 


AX? + anxs — x3, 
where a; are positive. When we intersect the locus Q with the plane x3 = 1, we obtain 
an ellipse ax; + ax}, = 1 in the remaining variables. In this case O is the union of lines 
through the origin and the points of this ellipse. 


(8.7.16) Hyperboloids Near to a Cone. 


Notice that g(x) is positive in the exterior of the double cone, and negative in its interior. 
(The value of g(x) changes sign only when one crosses Q.) So for any r > 0, the locus 
ayxs + nx’, _ es —r = 0 lies in the exterior of the double cone. It is a one-sheeted 
hyperboloid, while the locus a ie se anx5, — x +r = Olies in the interior, and is a two-sheeted 
hyperboloid. 

Similar reasoning can be applied to any homogeneous polynomial G(X... en), any 
polynomial in which all of the terms have the same degree d. If gis homogeneous of degree 
dae Ox): = 24 g(x), and because of this, the locus {g = 0} will also be a union of lines 
through the origin. 


8.8 SKEW-SYMMETRIC FORMS 

The description of skew-symmetric bilinear forms is the same for any field of scalars, so in 
this section we allow vector spaces over an arbitrary field a. However, as usual, it may be 
best to think of real vector spaces when going through this for the first time. 
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A bilinear form (, ) on a vector space V is skew-symmetric if it has either one of the 
following equivalent properties: 


(8.8.1) (v,v) =0 forallvin V, or 


(8.8.2) (u, v) =-(v,u) for allu and vin V. 


To be more precise, these conditions are equivalent whenever the field of scalars has 
characteristic different from 2. If F has characteristic 2, the first condition (8.8.1) is the 
correct one. The fact that (8.8.1) implies (8.8.2) is proved by expanding (u + v, u + v): 


(u+v,u+v) = (u,u) + (u,v) + (vu, u) + (VU, Vv), 


and using the fact that (u,u) = (v,v) = (4+v,u+uv) = 0. Conversely, if the second 
condition holds, then setting u = v gives us (v, v) = -(v, v), hence 2(v, v) = 0, and it follows 
that (vy. »p) = 0, umess 2 — 0. 

A bilinear form (, ) is skew-symmetric if and only if its matrix A with respect to an 


arbitrary basis is a skew-symmetric matrix, meaning that aj; = -a;; and a;; = 0, for all i and 
j. Except in characteristic 2, the condition a;; = 0 follows from aj; = -a;; when one sets 
b=, 
The determinant form (X, Y) on R?, the form defined by 
x1 Yi 
8.8.3 AY) = det =x — X21, 
(8.8.3) (X, Y) io | 192 — X21 


is a simple example of a skew-symmetric form. Linearity and skew symmetry in the columns 
are familiar properties of the determinant. The matrix of the determinant form (8.8.3) with 
respect to the standard basis of R? is 


(8.8.4) Da E ‘] . 


We will see in Theorem 8.8.7 below that every nondegenerate skew-symmetric form looks 
very much like this one. 


Skew-symmetric forms also come up when one counts intersections of paths on a 
surface. To obtain a count that doesn’t change when the paths are deformed, one can adopt 
the rule used for traffic flow: A vehicle that enters an intersection from the right has the 
right of way. If two paths X and Y on the surface intersect at a point p, we define the 
intersection number (X, Y)» at p as follows: If X enters the intersection to the right of 
Y, then (X, Y)p = 1, and if X enters to the left of Y, then (XY, Y)p = -1. Then in either 
case, (X, Y) p =-(Y, X) p. The total intersection number (X, Y) is obtained by adding these 
contributions for all intersection points. In this way the contributions arising when X crosses 
Y and then turns back to cross again cancel. This is how topologists define a product in 
“homology.” 
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(8.8.5) Oriented Intersections (X, Y). 


Many of the definitions given in Section 8.4 can be used also with skew-symmetric 
forms. In particular, two vectors v and w are orthogonal if (vu, w) = 0. It is true once more 
that vw if and only if wLuv, but there is a difference: When the form is skew-symmetric, 
every vector vu is self-orthogonal: vu. And since all vectors are self oithogonal, there can 
be no orthogonal bases. 

As is true for symmetric forms, a skew-symmetric form is nondegenerate if and only if 
its matrix with respect to an arbitrary basis is nonsingular. The proof of the next theorem is 
the same as for Theorem 8.4.5. 


Theorem 8.8.6 Let (, ) be a skew-symmetric form on a vector space V, and let W be a 
subspace of V on which the form is nondegenerate. Then V is the orthogonal sum W © W-. 
If the form is nondegenerate on V and on W, it is nondegenerate on W~ too. O 


Theorem 8.8.7 


(a) Let V be a vector space of positive dimension m over a field F, and let /, ) be a 
nondegenerate skew-symmetric form on V. The dimension of V is even, and V has a 
basis B such that the matrix Sp of the form with respect to that basis is made up of 
diagonal blocks, where all blocks are equal to the 2 x 2 matrix S shown above (8.8.4): 


DS 
Sane 
» 


(b) Matrix form: Let A be an invertible skew-symmetric m Xm matrix. There is an invertible 
matrix P such that P‘AP = Spo is as above. 


Proof. (a) Since the form is nondegenerate, we may choose nonzero vectors U} and v2 such 
that (v;, v2) = cis not zero. We adjust v2 by a scalar factor to make c = 1. Since (uv, v2) #0 
but (v;, v,) = 0, these vectors are independent. Let W be the two-dimensional subspace with 
basis (v1, U2). The matrix of the form restricted to W is &. Since this matrix is invertible, the 
form is nondegenerate on W, so V is the direct sum W © W~, and the form is nondegenerate 
on W?. By induction, we may assume that there is a basis (v3, ..., Un) for W~ such that 
the matrix of the form on this subspace has the form (8.8.7). Then (v1, U2. U3, ..-, Un) is the 


required basis for V. O 
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Corollary 8.8.8 If A is an invertible m Xm skew-symmetric matrix, then m is an even integer. 
O 


Let ( , ) be a nondegenerate skew-symmetric form on a vector space of dimension 2n. 
We rearrange the basis referred to in Theorem 8.8.7 as (1, U3, ..., V2n—13 U2, U4,°°° V2n). 
The matrix will be changed into a block matrix made up of n Xn blocks 


(8.8.9) << E , i 


8.9 SUMMARY 


We collect some of the terms that we have used together here. They are used for a symmetric 
or a skew-symmetric form on a real vector space and also for a Hermitian form on a complex 
vector space. 


orthogonal vectors: Two vectors v and w are orthogonal (written vw) if (v, w) = 0. 
orthogonal space to a subspace: The orthogonal space W+ to a subspace W of V is the set 
of vectors v that are orthogonal to every vector in W: 


wi={veV|(v,W) =o}. 


null vector: A null vector is a vector that is orthogonal to every vector in V. 


nullspace: The nullspace N of the given form is the set of null vectors: 
By a ne 


nondegenerate form: The form is nondegenerate if its nullspace is the zero space {0}. This 
means that for every nonzero vector v, there is a vector v’ such that (v, v’) £0. 


nondegeneracy on a subspace: The form is nondegenerate on a subspace W if its restriction 
to W is a nondegenerate form, or if WM W+ = {0}. If the form is nondegenerate on a 
subspace W, then V= WO Wt. 


orthogonal basis: A basis B = (v),..., Un) of V is orthogonal if the vectors are mutually 
orthogonal, that is, if (v;, v;) = 0 for all indices i and j with i+ j. The matrix of the form 
with respect to an orthogonal basis is a diagonal matrix. Orthogonal bases exist for any 
symmetric or Hermitian form, but not for a skew-symmetric form. 


orthonormal basis: A basis B = (v1,..., U,) is orthonormal if (v;, vj) = 0 for i#j and 
(vj, vj) = 1. An orthonormal basis for a symmetric or Hermitian form exists if and only if 
the form is positive definite. 


orthogonal projection: If a symmetric or Hermitian form is nondegenerate on a subspace 
W, the orthogonal projection to W is the unique linear transformation 27: V > W such that: 
m(v) = vif visin W, and z(v) = Oif v is in the orthogonal space W+. 
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If the form is nondegenerate on a subspace W and if (w,,..., wx) is an orthogonal 


basis for W, the orthogonal projection is given by the formula z(v) = wc; +--+: WKCk, 
where 


(W;, V) 


Cc; = : 
(Wi, W;) 


Spectral Theorem: 

¢ If A is normal, there is a unitary matrix P such that P*AP is diagonal. 

¢ If A is Hermitian, there is a unitary matrix P such that P*AP is a real diagonal matrix. 

e In the unitary group U,, every matrix is conjugate to a diagonal matrix. 

e If Ais a real symmetric matrix, there is an orthogonal matrix P such that P!AP is diagonal. 


The table below compares various concepts used for real and for complex vector 
spaces. 


Real Vector Spaces Complex Vector Spaces 
forms 
symmetric Hermitian 
(tar (20, 27) (AD) == 0, v) 
matrices 
symmetric Hermitian 
At = A A* = A 
orthogonal unitary 
Aa AT 
normal 
A*A = AA* 
operators 
symmetric Hermitian 
(Tv, w) = (v, Tw) (Tv, w) = (v, Tw) 
orthogonal unitary 
(UD == GU. Tap) (yap (Ty) Pw) 
normal 
(Tu, Tw) av; Fw) 
arbitrary 


amie Ty, 1b) 


In helping geometry, modern algebra is helping itself above all. 


—Oscar Zariski 
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EXERCISES 


Section 1 Real Bilinear Forms 


1.1. 


Show that a bilinear form ( , ) on areal vector space V is a sum of a symmetric form and 
a skew-symmetric form. 


Section 2 Symmetric Forms 


aus 


Dude 


Prove that the maximal entries of a positive definite, symmetric, real matrix are on the 
diagonal. 


Let A and A’ be symmetric matrices related by A’ = P'AP, where P is invertible. Is it 
true that the ranks of A and of A’ are equal? 


Section3 Hermitian Forms 


3.1. 
Om 


Seas 
3.4. 
Bs 


3.6. 


Is a complex n Xn matrix A such that X*AX is real for all Xa Hermitian matrix? 


Let (, ) be a positive definite Hermitian form on a complex vector space V, and let { , } 
and [ , | be its real and imaginary parts, the real-valued forms defined by 


(v, w) = {v, w} + [v, w]i. 
Prove that when V is made into a real vector space by restricting scalars to R, { ,} is a 
positive definite symmetric form, and [ , ] is a skew-symmetric form. 
The set of n Xn Hermitian matrices forms a real vector space. Find a basis for this space. 
Prove that if A is an invertible matrix, then A*A is Hermitian and positive definite. 


Let A and B be positive definite Hermitian matrices. Decide which of the following 
matrices are necessarily positive definite Hermitian: A”, A~', AB, A+B. 


Use the characteristic polynomial to prove that the eigenvalues of a 2 X 2 Hermitian 
matrix A are real. 


Section 4 Orthogonality 


4.1. 
4.2. 


4.3. 


4.4. 


4.5. 


4.6. 


What is the inverse of a matrix whose columns are orthogonal? 


Let (, ) be a bilinear form on a real vector space V, and let v be a vector such that 
(v, v) #0. What is the formula for orthogonal projection to the space W = ut orthogonal 
to v? 


Let A be a real m Xn matrix. Prove that B = A‘A is positive semidefinite, i.e., that 
X'BX > 0 for all_X, and that A and B have the same rank. 


Make a sketch showing the positions of some orthogonal vectors in R*, when the form is 
(X, Y) = x11 — x22. 


Find an orthogonal basis for the form on R” whose matrix is 
T Og 
Lica 
(a) |; gale), Oe, 2a 
ee | 


Extend the veetor X; = 5(1, ~1, 1, 1)' to an orthonormal basis for R’. 


4.7. 
4.8. 
4.9, 


4.10. 


4.11. 


4.12. 


*4,13. 


4.14. 


4.15. 


4.16. 


4.17. 


4.18. 


4.19. 
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Apply the Gram—Schmidt procedure to the basis (1, 1, 0), (1, 0, 1)', (0, 1, 1)t of R?. 
Ph il 
LetAs= i | Find an orthonormal basis for R* with respect to the form XtAY. 


Find an orthonormal basis for the vector space P of all real polynomials of degree at most 
2, with the symmetric form defined by 


1 
(fF, gh= [ f(x) g(x)dx. 


Let V denote the vector space of real n Xn matrices. Prove that (A, B) = trace(A'B) 
defines a positive definite bilinear form on V, and find an orthonormal basis for this form. 
Let W,, W2 be subspaces of a vector space V with a symmetric bilinear form. Prove 
(a) (W, + Wo) = Wi nw, (b)W CW, (c) If Wy C Wo, then WE D> WH. 


Let V = R* be the vector space of real 2 x2 matrices. 


(a) Determine the matrix of the bilinear form (A, B) = trace(AB) on V with respect to 
the standard basis {e;;}. 


(b) Determine the signature of this form. 
(c) Find an orthogonal basis for this form. 
(d) Determine the signature of the form trace AB on the space R”” of real n Xn matrices. 


(a) Decide whether or not the rule (A, B) = trace(A*B) defines a Hermitian form on 
the space C” of complex matrices, and if so, determine its signature. 


(b) Answer the same question for the form defined by (A, B) = trace(AB). 

The matrix form of Theorem 8.4.10 asserts that if A is a real symmetric matrix, there 
exists an invertible matrix P such that P'AP is diagonal. Prove this by row and column 
operations. 


Let W be the subspace of R? spanned by the vectors (1, 1, 0)! and (0, 1, 1)’. Determine 
the orthogonal projection of the vector (1, 0, 0)’ to W. 


Let V be the real vector space of 3X3 matrices with the bilinear form (A, B) = trace A‘B, 
and let W be the subspace of skew-symmetric matrices. Compute the orthogonal projec- 
tion to W with respect to this form, of the matrix 


2 
Cesta! 
isu 


Use the method of (3.5.13) to compute the coordinate vector of the vector (x1, x2, x3)’ 
with respect to the basis B described in Example 8.4.14, and compare your answer with 
the projection formula. 

Find the matrix of a projection 7:R° > IR? such that the image of the standard bases of 
IR? forms an equilateral triangle and zr(e;) points in the direction of the x-axis. 

Let W be a two-dimensional subspace of R*, and consider the orthogonal projection z of 


R3 onto W. Let (a;, b;)' be the coordinate vector of 7(e;), with respect to a chosen or- 
thonormal basis of W. Prove that (a;, a2, a3) and (by, b2, b3) are orthogonal unit vectors. 
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4.20. 


4.21. 


Prove the criterion for positive definiteness given in Theorem 8.4.19. Does the criterion 
carry over to Hermitian matrices? 


Prove Sylvester’s Law (see 8.4.17). 


Hint: Begin by showing that if W, and W2 are subspaces of V and if the form is positive 
definite on W, and negative semi-definite on W2, then W; and W? are independent. 


Section 5 Euclidean Spaces and Hermitian Spaces 


Sele 


5i2. 
"Soe 


5.4. 


5:5: 


5:65 


Let V be a Euclidean space. 

(a) Prove the Schwarz inequality |(v, w)| < |v||w|. 

(b) Prove the parallelogram law |v + w/? + |v — wl? = 2|v|? + 2|w|?. 
(c) Prove that if |v| = |w], then (v+ w)L(v— w). 

Let W be a subspace of a Euclidean space V. Prove that W = W++. 


Let w € R” be a vector of length 1, and let U denote the orthogonal space w+. The 
reflection ry, about U is defined as follows: We write a vector v in the form v = cw + u, 
where u € U. Thenr,(v) =-cw+ u. 


(a) Prove that the matrix P = J — 2ww’ is orthogonal. 
(b) Prove that multiplication by P is a reflection about the orthogonal space U. 
(c) Let u, v be vectors of equal tength in R”. Determine a vector w such that Pu = v. 


Let T be a linear operator on V = R” whose matrix A is a real symmetric matrix. 


(a) Prove that V is the orthogonal sum V = (ker 7) ® (im 7). 
(b) Prove that T is an orthogonal projection onto im T if and only if, in addition to being 
symmetric, A? = A. 


Let P be a unitary matrix, and let X; and X 2 be eigenvectors for P, with distinct 
eigenvalues A, and Az. Prove that X; and X2 are orthogonal with respect to the standard 
Hermitian form on C”. 


What complex numbers might occur as eigenvalues of a unitary matrix? 


Section6 The Spectral Theorem 


6.1. 
6.2. 


6.3. 


6.4. 
6.5. 


6.6. 
6.7. 


Prove Proposition 8.6.3(c), (d). 


Let T be a symmetric operator on a Euclidean space. Using Proposition 8.6.9, prove that 
if v is a vector and if T7v = 0, then Tv = 0. 


What does the Spectral Theorem tell us about a real 3 X 3 matrix that is both symmetric 
and orthogonal? 


What can be said about a matrix A such that A*A is diagonal? 


Prove that if A is a real skew-symmetric matrix, then iA is a Hermitian matrix. What 
does the Spectral Theorem tell us about a real skew-symmetric matrix? 


Prove that an invertible matrix A is normal if and only if A*A7! is unitary. 


Let P be a real matrix that is normal and has real eigenvalues. Prove that P is 
symmetric. oa 


6.8. 


6.9. 


6.10. 


6.11. 


6.12. 


6.13. 


6.14. 


6.15. 


*6.16. 


*6.17, 


6.18. 


6.19. 
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Let V be the space of differentiable complex-valued functions on the unit circle in the 
complex plane, and for f, g € V, define 


20% 


(f,8) = : OO 


(a) Show that this form is Hermitian and positive definite. 


(b) Let W be the subspace of V of functions f(e’’), where f is a polynomial of degree 
<n. Find an orthonormal basis for W. 


(c) Show that T = is, is a Hermitian operator on V, and determine its eigenvalues 
on W. 


Determine the signature of the form on R* whose matrix is k : | and determine an 
orthogonal matrix P such that P’AP is diagonal. 


Prove that if J is a Hermitian operator on a Hermitian space V, the rule {v, w} = (v, Tw) 
defines a second Hermitian form on V. 


Prove that eigenvectors associated to distinct eigenvalues of a Hermitian matrix A are 
orthogonal. 


aes 
5. Find a real orthogonal matrix P so that P'AP is diagonal, when A is the 
matrix 


Find a unitary matrix P so that P*AP is diagonal, when A = ae 


12 lt at Onl 
(a) E A , Deel 1 cep 18 
shoe tee se) 


Prove that a real symmetric matrix A is positive definite if and only if its eigenvalues are 
positive. 

Prove that for any square matrix A, kerA = (im A*)+, and that if A is normal, 
ker A = (im A). 

Let ¢ = e?”'/", and let A be the n Xn matrix whose entries are aj, = cJk /./n. Prove that 
A is unitary. 

Let A, B be Hermitian matrices that commute. Prove that there is a unitary matrix P such 
that P*AP and P* BP are both diagonal. 


Use the Spectral Theorem to prove that a positive definite real symmetric n Xn matrix A 
has the form A = P'P for some P. 


Prove that the cyclic shift operator 


is unitary, and determine its diagonalization. 
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6.20. Prove that the circulant, the matrix below, is normal. 


Co CH aan Cn 
Cn Co °": Cn-t 
C1 C7 aan Co 


6.21. What conditions on the eigenvalues of a normal matrix A imply that A is Hermitian? 
That A is unitary? 


6.22. Prove the Spectral Theorem for symmetric operators. 


Section 7 Conics and Quadrics 


7.1. Determine the type of the quadric x” + 4xy + 2xz+274+3x+z-6=0. 


7.2. Suppose that the quadratic equation (8.7.1) represents an ellipse. Instead of diagonalizing 
the form and then making a translation to reduce to the standard type, we could make 
the translation first. How can one determine the required translation? 


7.3. Give a necessary and sufficient condition, in terms of the coefficients of its equation, for 
a conic to be a circle. 


7.4. Describe the degenerate quadrics geometrically. 


Section 8 Skew-Symmetric Forms 
8.1. Let A be an invertible, real, skew-symmetric matrix. Prove that A? is symmetric and 
negative definite. 


8.2. Let W be a subspace on which a real skew-symmetric form is nondegenerate. Find a 
formula for the orthogonal projection 7: V > W. 


8.3. Let Sbe areal skew-symmetric matrix. Prove that +S is invertible, and that (7—S)(J+S)! 
is orthogonal. 


*8.4. Let A be a real skew-symmetric matrix. 
(a) Prove that detA > 0. 
(b) Prove that if A has integer entries, then det A is the square of an integer. 


Miscellaneous Problems 


M.1. According to Sylvester’s Law, every 2X2 real symmetric matrix is congruent to exactly one 
of six standard types. List them. If we consider the operation of GL on 2 X 2 matrices by 
P x A = PAP', then Sylvester’s Law asserts that the symmetric matrices form six orbits. 
We may view the symmetric matrices as points in R:, letting (x, y, z) correspond to the 


matrix re | Describe the decomposition of R? into orbits geometrically, and make a 
clear drawing depicting it. 
Hint: If you don’t get a beautiful result, you haven’t understood the configuration. 


M.2. Describe the symmetry of the matrices AB + BA and AB — BA in the following cases. 
(a) A, Bsymmetric, (b) A,B Hermitian, (c) A, B skew-symmetric, 
(d) A symmetric, B skew-symmetric. 


M.3. 


M.4. 


M.S. 


M.6. 


*hi.7. 


M.8. 


*M.9. 
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With each of the following types of matrices, describe the possible determinants and 
eigenvalues. 


(a) real orthogonal, (b) unitary, (c) Hermitian, (d) real symmetric, negative 
definite, (e) real skew-symmetric. 


Let E be an m Xn complex matrix. Prove that the matrix * on is invertible. 


The vector cross product is xX y = (x2 y3-x3y2, X31-X1 3, X1y2-X2 1)". Let v be a fixed 
vector in R°, and let T be the linear operator 7(x) = (x Xv) Xv. 


(a) Show that this operator is symmetric. You may use general properties of the scalar 
triple product det [x|y|z] = (x X y) - z, but not the matrix of the operator. 
(b) Compute the matrix. 


(a) What is wrong with the following argument? Let P be a eal orthogonal matrix. 
Let X be a (possibly complex) eigenvector of P, with eigenvalue A. Then X'PLXY = 
(PX)'X = ~AX*X. On the other hand, X'P'*X = X'(P™1X) = A7!X*X. Therefore 
A=A ll andsoa = +1. 

(b) State and prove a correct theorem based on the error in this argument. 


Let A be a real m Xn matrix. Prove that there are orthogonal matrices P in O,,, and Q 

in O, such that PAQ is diagonal, with non-negative diagonal entries. 

(a) Show that if A is a nonsingular complex matrix, there is a positive definite Hermitian 
matrix B such that B? = A*A, and that B is uniquely determined by A. 

(b) Let A be a nonsingular matrix, and Jet B be a positive definite Hermitian matrix such 
that B? = A*A. Show that AB“! is unitary. 

(c) Prove the Polar decomposition: Every nonsingular matrix A is a product A = UP, 
where P is positive definite Hermitian and U is unitary. 

(d) Prove that the Polar decomposition is unique. 

(e) What does this say about the operation of left multiplication by the unitary group U, 
on the group GLy,? 

Let V be a Euclidean space of dimension n, and let S = (v1, ..., Ux) be a set of vectors 

in V. A positive combination of S is a linear combination p;v, + --- + pxv,x in which all 

coefficients p; are positive. The subspace U = {v|(v, w) = 0} of V of vectors orthogonal 

to a vector w is called a hyperplane. A hyperplane divides the space V into two half 

spaces {v|(v, w) = 0} and {v|(v, w) < O}. 


(a) Prove that the following are equivalent: 
e Sis not contained in any half space. 
e For every nonzero vector w in V, (vj, w) < 0 for some i = ey 

(b) Let S’ be the set obtained by deleting vz from S. Prove that if S is not contained in a 
half space, then S’ spans V. 

(c) Prove that the following conditions are equivalent: 


(i) S is not contained in a half space. 
(ii) Every vector in V is a positive combination of S. 
(iii) S spans V and 0 is a positive combination of S. 
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M.10. 


M.11. 


M.12. 


WES. 


M.14. 
M.15. 


Hint: To show that (i) implies (ii) or (iii), I recommend projecting to the space U 
orthogonal to vz. That will allow you to use induction. 


The row and column indices in the n Xn Fourier matrix A run from 0 ton — 1, and thei, j 
entry is C/, with ¢ = e?'/”. This matrix solves the following interpolation problem: Given 
complex numbers bo, ..., by—1, find a complex polynomial f(t) = co+c,t+:--+Cp—-1t" =1 
such that f(f”) = by. 


(a) Explain how the matrix solves the problem. 
(b) Prove that A is symmetric and normal, and compute Le 
x(c) Determine the eigenvalues of A. 


Let A be a real n Xn matrix. Prove that A defines an orthogonal projection to its image 
W if and only if A* = A = A‘A. 


Let A be areal n Xn orthogonal matrix. 


(a) Let X be acomplex eigenvector of A with complex eigenvalue 2. Prove that X'X = 0. 
Write the eigenvector as X¥ = R-+ Si where R and S are real vectors. Show that 
the space W spanned by R and S is A-invariant, and describe the restriction of the 
operator A to W. 


(b) Prove that there is a real orthogonal matrix P such that P’AP is a block diagonal 
matrix made up of 1 X 1 and 2 X 2 blocks, and describe those blocks. 


Let V = R", and let (X, Y) = X'AY, where A is a symmetric matrix. Let W be the 
subspace of V spanned by the columns of ann Xr matrix M of rank r, andletz: V > W 
denote the orthogonal projection of V to W with respect to the form ( , ). One can 
compute zr in the form 2(X) = MY by setting up and solving a suitable system of linear 
equations for Y. Determine the matrix of z explicitly in terms of A and M. Check your 
result in the case that r = 1 and ( , ) is dot product. What hypotheses on A and M are 
necessary? 


What is the maximal number of vectors v; in R” such that (v; - vj) < 0 for all i# j? 


'This problem is about the space V of real polynomials in the variables x and y. If f is 
a polynomial, d ¢ will denote the operator f(<, ae and 0 ¢(g) will denote the result of 
applying this operator to a polynomial g. 


(a) The rule (f, g) = 0¢(g)o defines a bilinear form on V, the subscript 0 denoting 
evaluation of a polynomial at the origin. Prove that this form is symmetric and 
positive definite, and that the monomials x! y/ form an orthogonal basis of V (not an 
orthonormal basis). 

(b) We also have the operator of multiplication by f, which we write as m f- SO 
m ¢(g) = fg. Prove that 0 ¢ and m + are adjoint operators. 


(c) When f = x* + y*, the operator a f is the Laplacian, which is often written as 
A. A polynomial fA is harmonic if Ah = 0. Let H denote the space of harmonic 
polynomials. Identify the space H+ orthogonal to H with respect to the given form. 


Suggested by Serge Lang. 
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Linear Groups 


In these days the angel of topoiogy and the devil of abstract algebra 
fight for the soul of every individual discipline of mathematics. 


—Hermann Wey!! 


9.1 THE CLASSICAL GROUPS 


Subgroups of the general linear group GL, are called linear groups, or matrix groups. The 
most important ones are the special linear, orthogonal, unitary, and symplectic groups — the 
classical groups. Some of them will be familiar, but let’s review the definitions. 


The real special linear group SL» is the group of real matrices with determinant 1: 
(9.1.1) SLyz= {P € GL, (R) | deer = 17 

The orthogonal group Oy, is the group of real matrices P such that P! = P7!: 
(9.1.2) O,=|(PeGl,(®)|PP= i) 


A change of basis by an orthogonal matrix preserves the dot product X'Y on R”, 
The unitary group U,, is the group of complex matrices P such that P* = P™!: 


(9.1.3) Up Nee, (C) | P= 7). 


A change of basis by a unitary matrix preserves the standard Hermitian product X*Y 
on C”. 
The symplectic group is the group of real matrices that preserve the skew-symmetric 
form X'SY on R2”, where 
Omer 
s=[9 J] 


(9.1.4) SPon = {P € GL2n(R) | PSP = S}. 


1 This quote is taken from Morris Kline’s book Mathematical Thought from Ancient to Modern Times. 
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There are analogues of the orthogonal group for indefinite forms. The Lorentz group 
is the group of real matrices that preserve the Lorentz form (8.2.2) 


(9.1.5) 031 = Ve EGLy | P'I31P = 13,1}. 


The linear operators represented by these matrices are called Lorentz transformations. An 
analogous group Op,m can be defined for any signature p, m. 


The word special is added to indicate the subgroup of matrices with determinant 1: 


Special orthogonal group SO,: real orthogonal matrices with determinant 1, 
Special unitary group SU, unitary matrices with determinant 1. 


Though this is not obvious from the definition, symplectic matrices have determinant 1, so 
the two uses of the letter S do not conflict. 


Many of these groups have complex analogues, defined by the same relations. But 
except in Section 9.8, GLy, SLn, On, and SP2, stand for the real groups in this chapter. 
Note that the complex orthogonal group is not the same as the unitary group. The defining 
properties of these two groups are P’P = J and P*P = J, respectively. 


We plan to describe geometric properties of the classical groups, viewing them as 
subsets of the spaces of matrices. The word “homeomorphism” from topology will come 
up. A homeomorphism @: X — Y is a continuous bijective map whose inverse function 
is also continuous [Munkres, p. 105]. Homeomorphic sets are topologically equivalent. It 
is important not to confuse the words ‘“‘homomorphism”’ and “homeomorphism,” though, 
unfortunately, their only difference is that “homeomorphism” has one more letter. 

The geometry of a few linear groups will be familiar. The unit circle, 


xi+x? =1, 


for instance, has several incarnations as a group, all isomorphic. Writing (x9, x1) = 
(cos 0, sin@) identifies the circle as the additive group of angles. Or, thinking of it as 
the unit circle in the complex plane by e’?, it becomes a multiplicative group, the group of 
unitary 1 X 1 matrices: 


(9.1.6) | Uap ae) pp = 1. 


The unit circle can also be embedded into R7 by the map 


: cosO -sind 
(9.1.7) (cos 9, sin 8) ~» sind mer . 


It is isomorphic to the special orthogonal group SOz2, the group of rotations of the plane. 
These are three descriptions of what is essentially the same group, the circle group. 

The dimension of a linear group G is, roughly speaking, the number of degrees of 
freedom of a matrix in G. The circle group has dimension 1. The group SL has dimension 
3, because the equation det P = 1 eliminates one degree of freedom from the four matrix 
entries. We discuss dimension more carefully in Section 9.7, but we want to describe some 
of the low-dimensional groups first. The smallest dimension in which really interesting 
nonabelian groups appear is 3, and the most important ones are SU2, SO3, and SL>. We 
examine the special unitary group SU? and the rotation group SO3 in Sections 9.3 and 9.4. 
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9.2 INTERLUDE: SPHERES 
By analogy with the unit sphere in R?, the locus 
(xo tapt-.- +22 =1) 


in R"*! is called the n-dimensional unit sphere, or the n-sphere, for short. We’ll denote it by 
S”. Thus the unit sphere in R? is the 2-sphere S2, and the unit circle in R? is the 1-sphere S!. 
A space that is homeomorphic to a sphere may sometimes be called a sphere too. 

We review stereographic projection from the 2-sphere to the plane, because it can be 
used to give topological descriptions of the sphere that have analogues in other dimensions. 
We think of the x-axis as the vertical axis in (xo, x1, x2)-space R?. The north pole on the 
sphere is the point p = (1, 0,0). We also identify the locus {xo = 0} with a plane that we 
call V, and we label the coordinates in V as v,, v2. The point (v1, v2) of V corresponds to 
(0, vy, v2) in R?. 

Stereographic projection 2:S* — V is defined as follows: To obtain the image 7r(x) of 
a point x on the sphere, one constructs the line £ that passes through p and x. The projection 
7(x) is the intersection of € with V. The projection is bijective at all points of S* except the 
north pole, which is “‘sent to infinity.” 


(9.2.1) Stereographic Projection. 


One way to construct the sphere topologically is as the union of the plane V and a single 
point, the north pole. The inverse function to z does this. It shrinks the plane a lot near 
infinity, because a small circle about p on the sphere corresponds to a large circle in the plane. 

Stereographic projection is the identity map on the equator. It maps the southern 
hemisphere bijectively to the unit disk {vj + v5 < 1} in V, and the northern hemisphere to 
the exterior {vi + v5 > 1} of the disk, except that the north pole is missing from the exterior. 
On the other hand, stereographic projection from the south pole would map the northern 
hemisphere to the disk. Both hemispheres correspond bijectively to disks. This provides a 
second way to build the sphere topologically, as the union of two unit disks glued together 
along their boundaries. The disks need to be stretched, like blowing up a balloon, to make 
the actual sphere. 

To determine the formula for stereographic projection, we write the line through Pp 
and x in the parametric form q(t) = p+t(x— p) = (1+ t(xo — 1), tx), x2). The point ¢(2) 
is in the plane V when ¢t = ee So 


(9.2.2) mx) = (v2) = (=e x) \ 


Xp 1 — Xo 
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Stereographic projection 7 from the n-sphere to n-space is defined in exactly the 
same way. The north pole on the n-sphere is the point p = (1,0, ..., 0), and we identify 
the locus {xp = 0} in R”+! with an n-space V. A point (v;,..., Un) of V corresponds to 
(O; vp, .. pay RE The image 7r(x) of a point x on the sphere is the intersection of the 
line @ through the north pole p and x with V. As before, the north pole p is sent to infinity, 
and zz is bijective at all points of S’ except p. The formula for zz is 


x4 Xn 
(9.2.3) mx) = (7... 7). 

This projection maps the lower hemisphere {x9 < 0} bijectively to the n-dimensional 
unit ball in V, the locus {v? +-+-.+ v2 < 1}, while projection from the south pole maps the 
upper hemisphere {xg > 0} to the unit ball. So, as is true for the 2-sphere, the n-sphere can 
be constructed topologically in two ways: as the union of an n-space V and a single point 
p, or as the union of two copies of the n-dimensional unit ball, glued together along their 
boundaries, which are (m — 1)-spheres, and stretched appropriately. 


We are particularly interested in the three-dimensional sphere S’, and it is worth making 
some effort to become acquainted with this locus. Topologically, S’ can be constructed either 
as the union of 3-space V and a single point p, or as the union of two copies of the unit 
ball {ut ate v3 += v3 < 1} in R°, glued together along their boundaries (which are ordinary 
2-spheres) and stretched. Neither construction can be made in three-dimensional space. 

We can think of V as the space in which we live. Then via stereographic projection, 
the lower hemisphere of the 3-sphere S* corresponds to the unit ball in space. Traditionally, 
it is depicted as the terrestrial sphere, the Earth. The upper hemisphere corresponds to the 
exterior of the Earth, the sky. 

On the other hand, the upper hemisphere can be made to correspond to the unit ball 
via projection from the south pole. When thinking of it this way, it is depicted traditionally as 
the celestial sphere. (The phrases “‘terrestial ball” and “celestial ball’? would fit mathematical 
terminology better, but they wouldn’t be traditional.) 


(9.2.4) A Model of the Celestial Sphere. - 
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To understand this requires some thought. When the upper hemisphere is represented 
as the celestial sphere, the center of the ball corresponds to the north pole of S*, and to 
infinity in our space V. While looking at a celestial globe from its exterior, you must imagine 
that you are standing on the Earth, looking out at the sky. It is a common mistake to think 
of the Earth as the center of the celestial sphere. 


Latitudes and Longitudes on the 3-Sphere 


The curves of constant latitude on the globe, the 2-sphere ihe ar a ap Be = 1), arewme 
horizontal circles x9 = c, with -1 <c < 1, and the curves of constant longitude are the 
vertical great circles through the poles. The longitude curves can be described as intersections 
of the 2-sphere with the two-dimensional subspaces of R? that contain the pole (1, 0, 0). 

When we go to the 3-sphere {x4 + 2 a KS + aa = 1}, the dimension increases, and one 
has to make some decisions about what the analogues should be. We use analogues that will 
have algebraic significance for the group SU? that we study in the next section. 

As analogues of latitude curves on the 3-sphere, we take the ‘“‘horizontal’’ surfaces, 
the surfaces on which the xg-coordinate is constant. We call these loci latitudes. They are 
two-dimensional spheres, embedded into R* by 


(9.2.5) y= C3 x + x3 + x3 ==), with=1<c<1. 


The particular latitude defined by xo = 0 is the intersection of the 3-sphere with the 
horizontal space V. It is the unit 2-sphere {ut ai Ub ai ts = 1} in V. We call this latitude the 
equator, and we denote it by E. 

Next, as analogues of the longitude curves, we take the great circles through the north 
pole (1, 0, 0, 0). They are the intersections of the 3-sphere with two-dimensional subspaces 
W of R‘ that contain the pole. The intersection L = WS> will be the unit circle in W, and 
we call L a longitude. If we choose an orthonormal basis (p, v) for the space W, the first 
vector being the north pole, the longitude will have the parametrization 


(9.2.6) L: £(0) = cosOp + sin@v. 
This is elementary, but we verify it below. 
Thus, while the latitudes on S* are 2-spheres, the longitudes are 1-spheres. 


Lemma 9.2.7 Let (p, v) be an orthonormal basis for a subspace W of IR‘, the first vector 
being the north pole p, and let L be the longitude of unit vectors in W. 


(a) L meets the equator E in two points. If v is one of those points, the other one is -v. 


(b) L has the parametrization (9.2.6). If g is a point of L, then replacing v by -v if necessary, 
one can express q in the form £(@) with @ in the interval 0 < 6 < 7, and then this 


representation of a point of L is unique for all 640, zr. 
(c) Except for the two poles, every point of the sphere S? lies on a unique longitude. 
Proof. We omit the proof of (a). 
(b) This is seen by computing the length of a vector ap + bu of W: 
lap + bu|* = a*(p- p) + 2ab(p-v) + b?(v-v) =a’ + b?. 
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So ap + bvis a unit vector if and only if the point (a, b) lies on the unit circle, in which case 


a= cos@ and b = sin@ for some 0. 


(c) Let x be a unit vector in R*, not on the vertical axis. Then the set (p, x) is independent, 
and therefore spans a two-dimensional subspace W containing p. So x lies in just one such 
subspace, and in just one longitude. O 


9.3. THE SPECIAL UNITARY GROUP SU, 


The elements of SU2 are complex 2 X 2 matrices of the form 


(9.3.1) p=| 4 = with Ga + bb = 1. 


Letswembyithis, Let P= - é| be an element of SU2, with a, b, u, v in C. The equations 
that define SU, are P* = P"! and detP = 1. When detP = 1, the equation P* = P'' 


becomes 
E ae pt — pol = v Al 
bv -u oa 


Therefore v = ad, u = ap. and then det P = ada + bb = 1. O 


Writing a = x9 + x,i and b = x2 + x3/ defines a bijective correspondence of SU>2 with 
the unit 3-sphere tee, - om + xe + ee = 1} in R*. 


SU > Si 
(9.3.2) Xot+xyl xX2+ x31 


This gives us two notations for an element of SU2. We use the matrix notation as much as 
possible, because it is best for computation in the group, but length and orthogonality refer 
to dot product in R*. 


Note: The fact.that the 3-sphere has a group structure is remarkable. There is no way to 
make the 2-sphere into a group. A famous theorem of topology asserts that the only spheres 
on which one can define continuous group laws are the 1-sphere and the 3-sphere. O 


In matrix notation, the north pole e9 = (1, 0, 0, 0) on the sphere is the identity matrix /. 
The other standard basis vectors are the matrices that define the quaternion group (2.4.5). 
We list them again for reference: 


ee Oe ee || Ae seal he 
(9313) I= Slap | ales i] <> €), €2, 6. 


These matrices satisfy relations such as ij = k that were displayed in (2.4. 6). The real vector 


space with basis (/, i, j, k) is called the quaternion algebra. So wie can be thought of as the 
set of unit vectors in the quaternion algebra. 
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Lemma 9.3.4 Except for the two special matrices +/, the eigenvalues of P (9.3.2) are 
complex conjugate numbers of absolute value 1. 


Proof. The characteristic polynomial of P is t* — 2xot + 1, and its discriminant D is 4x2 — 4. 
When (xo, x1, X2, x3) is on the unit sphere, xo is in the interval -1 < xo <1,and D <0. (In 
fact, the eigenvalues of any unitary matrix have absolute value 1.) O 


We now describe the algebraic structures on SU that correspond to the latitudes and 
longitudes on S? that were defined in the previous section. 


Proposition 9.3.5 The latitudes in SU2 are conjugacy classes. For a given c in the interval 
-1 <c <1, the latitude {xo = c} consists of the matrices P in SU> such that trace P = 2c. 
The remaining conjugacy classes are {J} and {-/}. They make up the center of SU). 


The proposition follows from the next lemma. 


Lemma 9.3.6 Let P be an element of SU> with eigenvalues A and A. There is an element Q 
in SU2 such that Q*P@Q is the diagonal matrix A with diagonal entries 4 and A. Therefore all 
elements of SU> with the same eigenvalues, or with the same trace, are conjugate. 


Proof. One can base a proof of the lemma on the Spectral Theorem for unitary operators, 
or verify it directly as follows: Let ¥ = (u, v)' be an eigenvector of P of length 1, with 
eigenvalue A, and let Y = (-0, “)'. You will be able to check that Y is an eigenvector of P 


with eigenvalue A, that the matrix O = |i =| is in SU2, and that PQ = QA. O 


The equator E of SU} is the latitude defined by the equation trace P = 0 (or xp = 0). 
A point on the equator has the form 


xi Xt | A 
(9.3.7) A= Ss oy ay = x4i+ x2] + x3k. 
Notice that the matrix A is skew-Hermitian: A* = —A, and that its trace is zero. We haven’t 
run across skew-Hermitian matrices before, but they are closely related to Hermitian 
matrices. A matrix A is skew-Hermitian if and only if iA is Hermitian. 

The 2 X2 skew-Hermitian matrices with trace zero form a real vector space of 
dimension 3 that we denote by V, in agreement with the notation used in the previous 
section. The space V is the orthogonal space to J. It has the basis (i, j, k), and E is the unit 
2-sphere in V. 


Proposition 9.3.8 The following conditions on an element A of SU’2 are equivalent: 
e Ais on the equator, i.e., trace A = 0, 
e the eigenvalues of A are i and -/, 
e AZ =-I. 


Proof. The equivalence of the first two statements follows by inspection of the characteristic 
polynomial ¢? — (trace A)t + 1. For the third statement, we note that -/ is the only matrix 
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in SU> with an eigenvalue -1. If A is an eigenvalue of A, then A? is an eigenvalue of A*. So 
A = +i if and only if A* has eigenvalues -1, in which case A? =-I. 0 


Next, we consider the longitudes of SU2, the intersections of SU2 with two-dimensional 
subspaces of R* that contain the pole J. We use matrix notation. 


Proposition 9.3.9 Let W be a two-dimensional subspace of R‘ that contains J, and let L be 
the longitude of unit vectors in W. 


(a) L meets the equator E in two points. If A is one of them, the other one is —A. Moreover, 
(1, A) is an orthonormal basis of W. 

(b) The elements of L can be written in the form Pg = (cos9@)/ + (sin@)A, with A on E 
and 0 < @ < 27. When P# +/, A and @ can be chosen with 0 < 0 < 7, and then the 
expression for P is unique. 

(c) Every element of SU except +/ lies on a unique longitude. The elements +/ lie on 
every longitude. 

(d) The longitudes are conjugate subgroups of SU2. 


Proof. When one translates to matrix notation, the first three assertions become Lemma 
9.2.7. To prove (d), we first verify that a longitude L is a subgroup. Let c, s and c’, s’ denote 
the cosine and sine of the angles a and a’, respectively, and let 6 = a + a’. Then because 
A* = -I, the addition formulas for cosine and sine show that 


(cl + sA)(c'I + s’A) = (cc’ — ss')I + (cs' + sc')A = (cos B)I + (sin B)A. 


So L is closed under multiplication. It is also closed under inversion. 

Finally, we verify that the longitudes are conjugate. Say that L is the longitude 
Pg = cI + SA, as above. Proposition 9.3.5 tells us that A is conjugate to i, say i = QAQ*. 
Then OPgQ* = cQIQ* + sQAQ* = cl + si. So L is conjugate to the longitude cJ + si. O 


Examples 9.3.10 


¢ The longitude cl + si, with c = cos @ and s = sin 8, is the group of diagonal matrices 
in SU2. We denote this longitude by 7. Its elements have the form 


Lt altel! a}-[% ee], 


¢ The longitude c/ + sj is the group of real matrices in SU, the rotation group SO>. 
The matrix c/ + si represents rotation of the plane through the angle -0. 


1 1 c Ss 
[ss [ost alah ae 
We haven’t run across the longitude cI + sk before. is 


The figure below was made by Bill Schelter. It shows a projection of the 3-sphere SU 
onto the unit disc in the plane. The elliptical disc shown is the image of the equator. Just 
as the orthogonal projection of a circle from R? to R? is an ellipse, the projection of the 
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2-sphere E from R4 to R? is an ellipsoid, and the further projection of this ellipsoid to the 
plane maps it onto an elliptical disc. Every point in the interior of the disc is the image of 
two points of E. 


Diagonal 
matrices 


Tiace-zero 
matrices 


(9.3.11) Some Latitudes and Longitudes in SU}. 


9.4 THE ROTATION GROUP SO; 


Since the equator E of SU? is a conjugacy class, the group operates on it by conjugation. 
We will show that conjugation by an element P of SU2, an operation that we denote by yp, 
rotates this sphere. This will allow us to describe the three-dimensional rotation group SO3 


in terms of the special unitary group SU2. 
The poles of a nontrivial rotation of E are its fixed points, the intersections of E with 


the axis of rotation (5.1.22). If A is on E, (A, cw) will denote the spin that rotates E with angle 
a about the pole A. The two spins (A, a) and (-A, -a) represent the same rotation. 


Theorem 9.4.1 


(a) The rule P ~~ yp defines a surjective homomorphism y: SU2 > SO3, the spin homo- 
morphism. Its kernel is the center {+ /} of SU2. 

(b) Suppose that P = (cos @)/ + (sin @)A, with 0 < 6 < 7 and with A on E. Then yp rotates 
E about the pole A, through the angle 20. So yp is represented by the spin (A, 26). 


The homomorphism y described by this theorem is called the orthogonal representation of 
SU>. It sends a matrix P in SU2, a complex 2 X 2 matrix, to a mysterious real 3 X 3 rotation 
matrix, the matrix of yp. The theorem tells us that every element of SU except +/ can 
be described as a nontrivial rotation together with a choice of spin. Because of this, SU2 is 
often called the spin group. 


We discuss the geometry of the map y before proving the theorem. If P is a point of 
SU, the point -P is its antipodal point. Since y is surjective and since its kernel is the center 
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Z = {+1}, SO; is isomorphic to the quotient group SU2/Z, whose elements are pairs of 
antipodal points, the cosets {+ P} of Z. Because y is two-to-one, SU? is called a double 
covering of SO3. 

The homomorphism jz: SOz — SO> of the 1-sphere to itself defined by pg ~» p2¢ 
is another, closely related, example of a double covering. Every fibre of 2 consists of two 
rotations, Og and Pg4,. 

The orthogonal representation helps to describe the topological structure of the 
rotation group. Since elements of SO3 correspond to pairs of antipodal points of SU2, we 
can obtain SO3 topologically by identifying antipodal points on the 3-sphere. The space 
obtained in this way is called (real) projective 3-space, and is denoted by Pp. 


(9.4.2) SO3 is homeomorphic to projective 3-space P°. 


Points of P? are in bijective correspondence with one-dimensional subspaces of R*. Every 
one-dimensional subspace meets the unit 3-sphere in a pair of antipodal points. 

The projective space P* is much harder to visualize than the sphere S3. However, it is 
easy to describe projective 1-space P!, the set obtained by identifying antipodal points of 
the unit circle S!. If we wrap S! around so that it becomes the lefthand figure of (9.4.3), the 
figure on the right will be P!. Topologically, P! is a circle too. 


(9.4.3) A Double Covering of the 1-Sphere. 


We'll describe P! again, in a way that one can attempt to extend to higher dimensional 
projective spaces. Except for the two points on the horizontal axis, every pair of antipodal 
points of the unit circle contains just one point in the lower semicircle. So to obtain P!, we 
simply identify a point pair with a single point in the lower semicircle. But the endpoints of 
the semicircle, the two points on the horizontal axis, must still be identified. So we glue the 
endpoints together, obtaining a circle as before. 

In principle, the same method can be used to describe P*. Except for points on the 
equator of the 2-sphere, a pair of antipodal points contains just one point in the lower 
hemisphere. So we can form P” from the lower hemisphere by identifying opposite points of 
the equator. Let’s imagine that we start making this identification by gluing a short segment 
of the equator to the opposite segment. Unfortunately, when we orient the equator to keep 
track, we see that the opposite segment gets the opposite orientation. So when we glue the 
two segments together, we have to insert a twist. This gives us, topologically, a Mébius band, 
and P* contains this Mobius band. It is not an orientable surface. __ 

Then to visualize P?, we would take the lower hemisphere in S? and identify antipodal 
points of its equator E. Or, we could take the terrestial ball and identify antipodal points of 
its boundary, the surface of the Earth. This is quite confusing. — 0 
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We begin the proof of Theorem 9.4.1 now. We recall that the equator E is the unit 
2-sphere in the three-dimensional space V of trace zero, skew-Hermitian matrices (9.3.7). 
Conjugation by an element P of SU2 preserves both the trace and the skew-Hermitian 
property, so this conjugation, which we are denoting by yp, operates on the whole space V. 
The main point is to show that yp is a rotation. This is done in Lemma 9.4.5 below. 

Let (U,V) denote the form on Y that is carried over from dot product on R?. 
The basis of V that corresponds to the standard basis of R? is (i, j. k) (9.3.3). We write 
U = 441+ uj + u3k and use analogous notation for V. Then 


(U, V) = u,v, + U2v2 + 4303. 
Lemma 9.4.4 With notation as above, (U, V) = -5trace(U V). 


Proof. We compute the product UV using the quaternion relatious (2.4.6): 
UV = (u4i+ u2j + u3k) (Yi a V2j + v3k) 
= -(uy4v; + U2v2 + u303) 1 + UXV, 
where U X V is the vector cross product 
UXV = (u2v3 — u3U2)i + (4301 — 41 U3)j + (4102 — 4201)k. 
Then because trace J = 2, and because i, j, k have trace zero, 
trace(UV) = -2(uj, v1 + U2v2 + u303) = -2(U, V). O 


Lemma 9.4.5 The operator yp is a rotation of E and of V. 


Proof. For review, yp is the operator defined by ypU = PUP*. The safest way to prove that 
this operator is a rotation may be to compute its matrix. But the matrix is too complicated to 
give much insight. It is nicer to describe y indirectly. We will show that yp is an orthogonal 
linear operator with determinant 1. Euler’s Theorem 5.1.25 will teil us that it is a rotation. 

To show that yp is a linear operator, we must show that for all U and V in V and 
all real numbers r, yp(U + V) = ypU + ypV and yp(rU) = r(ypU). We omit this routine 
verification. To prove that yp is orthogonal, we verify the criterion (8.6.9) for orthogonality, 
which is 


(9.4.6) (ypU, ypV) = (U, V). 
This follows from the previous lemma, because trace is preserved by conjugation. 


(yeU, ypV) = -5 trace((ypU)(ypV)) = -4 traceteUP* PVP) 
= -1 trace(PUVP*) = -5 trace(UV) = (U, V). 


Finally, to show that the determinant of yp is 1, we recall that the determinant of any 
orthogonal matrix is +1. Since SU is a sphere, it is path connected, and since the 
determinant is a continuous function, only one of the two values +1 can be taken on by 
det yp. When P = /, yp is the identity operator, which has determinant 1. So det yp = 1 for 


every P. O 
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We now prove part (a) of the theorem. Because yp is a rotation, y maps SU> to SO3. 
The verification that y is a homomorphism is simple: ypyo = Ypg because 


yp(yqU) = P(QUQ*)P* = (PQ)U(PQ)* = yrgw. 


We show next that the kernel of y is +/. If P is in the kernel, conjugation by P fixes 
every element of E, which means that P commutes with every such element. Any element of 
SU> can be written in the form QO = cl + sB with B in E. Then P commutes with Q too. So P 
is in the center {+/} of SU. The fact that y is surjective will follow, once we identify 20 as 
the angle of rotation, because every angle a has the form 20, with 0 < 0 < 7. 


Let P be an element of SU>, written in the form P = cos6/ + sin@A with A in E. It is 
true that ypA = A, so A isa pole of yp. Let @ denote the angle of rotation of yp about the 
pole A. To identify this angle, we show first that it is enough to identify the angle for a single 
matrix P in a conjugacy class. 

Say that P’ = QPQ*(= yoP) is a conjugate, where Q is another element of SU2. Then 
P’ = cos@/ +sin@A’, where A’ = ygA = QAQ*. The angle @ has not changed. 

Next, we apply Corollary 5.1.28, which asserts that if M and N are elements of SO3, and 
if M is a rotation with angle w about the pole X, then the conjugate M’ = NMN7! is a rotation 
with the same angle @ about the pole NX. Since y is a homomorphism, yp = VovPYo 
Since yp is a rotation with angle a about A, yp: is a rotation with angle aw about A’ = ygA. 
The angle a@ hasn’t changed either. 

This being so, we make the computation for the matrix P = cos @/ + sin 6i, which is the 
diagonal matrix with diagonal entries e!? and e7!°. We apply yp to j: 


' : id 1 —i0 210 
(9.4.7) yri = Fit = |° -|| \I° |= |-.20° 


= cos 20j + sin26k. 


The set (Gj, k) is an orthonormal basis of the orthogonal space W toi, and the equation above 
shows that yp rotates the vector j through the angle 20 in W. The angle of rotation is 26, as 
predicted. This completes the proof of Theorem (9.4.1). O 


9.5 ONE-PARAMETER GROUPS 


In Chapter 5, we used the matrix-valued function 


an _ pple” A agua As 
(9.5.1) é =! a 1 Opn 


to describe solutions of the differential equation un = AX. The same function describes the 
one-parameter groups in the general linear group — the differentiable homomorphisms from 


the additive group R* of real numbers to GL yp. 


Theorem 9.5.2 


(a) Let A be an arbitrary real or complex matrix, and let GL, denote GL,(R) or GL, (C). 
The map y:R* + GLy defined by g(t) = e'4 is a group homomorphism. 
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(b) Conversely, let g:R* —> GLy bea differentiable map that is a homomorphism, and let 
A denote its derivative y’(0) at the origin. Then y(t) = e*4 for all t. 


Proof. For any real numbers r and s, the matrices rA and sA commute. So (see (5.4.4)) 
(9.5.3) el ts)A = eA eA, 


This shows that e’4 is a homomorphism. Conversely, let g:R* > GLy be a differentiable 


homomorphism. Then g(At + t) = y(Ad g(t) and g(t) = ¢(0)—(0), so we can factor g(t) 
out of the difference quotient: 


(9.5.4) PAt+H—~ EO _ PAD —9O) 


At Ge 


Taking the limit as At — 0, we see that g/(t) = y'(0) g(t) = AG(t). Therefore y(t) is a 
matrix-valued function that solves the differential equation 


dg 
935 —= Ag. 
(9.5.5) HAP 
The function e4 is another solution, and when t = 0, both solutions take the value /. 
Therefore g(t) = e4 (see (5.4.9)). Oo 
Examples 9.5.6 


(a) Let A be the 2X2 matrix unit ej7. Then A? = 0. All but two terms of the series expansion 
for the exponential are zero, and e4 = I + eypt. 
Oly 7 ean ane: 
ta=|) p | then e =| a: 
(b) The usual parametrization of SO2 is a one-parameter group. 
IfA = E “G fethen et — kee aoe | 
1 0 sin t cos t 
(c) The usual parametrization of the unit circle in the complex plane is a one-parameter 
group in Uj. 
If ais a nonzero real number and aw = ai, then e’” = [cosat + isinat]. O 


If w is a nonreal complex number of absolute value #1, the image of e’® in C* will be a 
logarithmic spiral. If a is a nonzero real number, the image of e'@ is the positive real axis, 
and if a = 0 the image consists of the point 1 alone. 

If we are given a subgroup H of GL», we may also ask for one-parameter groups 
in H, meaning one-parameter groups whose images are in H, or differentiable homo- 
morphisms g: R*+ — H. It turns out that linear groups of positive dimension always 
have one-parameter groups, and they are usually not hard to determine for a particular 
group. 

Since the one-parameter groups are in bijective correspondence with n xn matrices, 
_ we are asking for the matrices A such that e’4 is in H for all ¢. We will determine the 
one-parameter groups in the orthogonal, unitary, and special linear groups. 
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(9.5.7) Images of Some One-Parameter Groups in CX = GL;(C). 


Proposition 9.5.8 


(a) IfA is a real skew-symmetric matrix (A' = -A), then e4 is orthogonal. If A is a complex 
skew-Hermitian matrix (A* = —A), then e4 is unitary. 

(b) The one-parameter groups in the orthogonal group O, are the homomorphisms ft ~» € 
where A is a real skew-symmetric matrix. 

(c) The one-parameter groups in the unitary group U, are the homomorphisms ft ~ e 
where A is a complex skew-Hermitian matrix. 


tA 
> 


tA 


Proof. We discuss the complex case. 

The relation (e4)* = e4”) follows from the definition of the exponential, and we know 
that (e4)"! = e4 (5.4.5). So if A is skew-Hermitian, i.e., A* = —A, then (e4)* = (e4)=4, 
and e4 is unitary. This proves (a) for complex matrices. 

Next, if A is skew-Hermitian, so is tA, and by what was shown above, e’ is unitary 
for all t, so it is a one-parameter group in the unitary group. Conversely, suppose that e’4 
unitary for all ¢. We write this as e’4” = e-4. Then the derivatives of the two sides of this 
equation, evaluated at t = 0, must be equal, so A* = —A, and A is skew-Hermitian. 

The proof for the orthogonal group is the same, when we interpret A* as A’. Oj 


We consider the special linear group SL», next. 
Lemma 9.5.9 For any square matrix A, e4 = dete’. 


ES An eigenvector X of A with eigenvalue A is also an eigenvector of e4 hens apenas 
e*. So, if A,,..., A» are the eigenvalues of A, then the eigenvalues of e4 are e*'. The trace 
of A is the sum i +---+A,, and the determinant of e4 is the product e.. ohn (4.5.15). 
Therefore ete — oh ‘tAn = edt... edn = dete. . Oo 
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Proposition 9.5.10 The one-parameter groups in the special linear group SLy are the 
homomorphisms t ~» e’4, where A is a real n Xn matrix whose trace is zero. 


Proof. Lemma 9.5.9 shows that if trace A = 0, then det e’4 = ef traceA — ¢9 — | for all t, so 


e’4 is a one-parameter group in SLy. Conversely, if det e’4 = 1 for all ¢, the derivative of 
eftraceA evaluated at t = 0, is zero. The derivative is trace A. O 


The simplest one-parameter group in SL; is the one in Example 9.5.6(a). The one- 
parameter groups in SU} are the longitudes described in (9.3.9). 


9.6 THE LIE ALGEBRA 


The space of tangent vectors to a matrix group G at the identity is called the Lie algebra of 
the group. We denote it by Lie(G). It is called an algebra because it has a law of composition, 
the bracket operation that is defined below. 

For instance, when we represent the circle group as the unit circle in the complex plane, 
the Lie algebra is the space of real multiples of i. 

The observation from which the definition of tangent vector is derived is something 
we learn in calculus: If p(t) = (¢1 (4), ..., @g(2)) is a differentiable path in R*, the velocity 
vector v = g’(0) is tangent to the path at the point x = g(0). A vector v is said to be tangent 
to a subset S of Ré at a point x if there is a differentiable path y(t), defined for sufficiently 
small ¢t and lying entirely in S, such that g(0) = x and g’(0) = v. 

The elements of a linear group G are matrices, so a path g(t) in G will be a matrix- 
valued function. Its derivative g/(0) at t = O will be represented naturally as a matrix, 
and if g(0) = J, the matrix g’(O) will be an element of Lie(G). For example, the usual 
parametrization (9.5.6)(b) of the group SO2 shows that the matrix ic 0 is in Lie(SO2). 

We already know a few paths in the orthogonal group O,: the one-parameter 
groups y(t) = e4’, where A is a skew-symmetric matrix (9.5.8). Since (e4'),;.9 = I and 
({e4") _, = A, every skew-symmetric matrix A is a tangent vector to O,, at the identity — an 
element of its Lie algebra. We show now that the Lie algebra consists precisely of those 
matrices. Since one-parameter groups are very special, this isn’t completely obvious. There 
are many other paths. 


Proposition 9.6.1 The Lie algebra of the orthogonal group O, consists of the skew- 
symmetric matrices. 


Proof. We denote transpose by *. If g is a path in Op, with g(0) = J and g’(0) = A, then 
g(t)* p(t) = I identically, and so Z(p(t)*~) =), Then 


d dg* AP — of 
S00) .0=(F ere 1 aie +A=0. 4 

Next, we consider the special linear group SLy,. The one-parameter groups ath Sie; 
have the form g(t) = e“! where A is a trace-zero matrix (9.5.10). Since (e4'),<9 = I and 
(fe4") _, = A, every trace-zero matrix A is a tangent vector to SLy at the identity — an 
element of its Lie algebra. 
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Lemma 9.6.2. Let gy be a path in GL, with g(0) = J and g/(0) = A. Then (4 (det ?) = 
trace A. 


Proof. We write the matrix entries of g as g;;, and we compute 4 det yg using the complete 
expansion (1.6.4) of the determinant: 


detg = 0 (sign p) 1, pt --Pn,pn- 
PESn 


By the product rule, 
d n 
(9.6.3) Geel ee > Git Gh yi ++ On, pn: 
zl 


We evaluate at f = 0. Since g(0) = J, gjj(0) = 0 if i#7 and g;(0) = 1. So in the sum 
(9.6.3), the term 9, p1 - i, --: Qn, pn evaluates to zero unless pj = j for all j#i, and 
if pj = j for all j#i, then since p is a permutation, pi = i too, and therefore p is the 
identity. So (9.6.3) evaluates to zero except when p = 1, and when p = 1, it becomes 
>; 9; (0) = trace A. This is the derivative of det ¢. O 


Proposition 9.6.4 The Lie algebra of the special linear group SL», consists of the trace-zero 
matrices. A O 


Proof. If g is a path in the special linear group with g(0) = J and g (0) = A, then 
det (y(t)) = 1 identically, and therefore < det (g(t)) = 0. Evaluating at t = 0, we obtain 
trace Ace). O 


Similar methods are used to describe the Lie algebras of other classical groups. Note 
also that the Lie algebras of O, and SL, are real vector spaces, subspaces of the space 
of matrices. It is usually easy to verify for other groups that Lie(G) is a real vector 
space. 


The Lie Bracket 


The Lie algebra has an additional structure, an operation called the bracket, the law of 
composition defined by the rule 


(9.6.5) [A, B] = AB — BA. 


The bracket is a version of the commutator: It is zero if and only if A and B commute. It isn’t 
an associative law, but it satisfies an identity called the Jacobi identity: 


(9.6.6) [A, [B, C]] + [B, [C, A] + [C, [A, B]] = 0. 


To show that the bracket is defined on the Lie algebra, we must check that if A and 
B are in Lie(G), then [A, B] is also in Lie(G). This can be done easily for any particular 
group. For the special linear group, the required verification is that if A and B have trace 
zero, then AB—BA also has trace zero, which is true because trace AB = trace BA. The Lie 
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algebra of the orthogonal group is the space of skew-symmetric matrices. For that group, we 
must verify that if A and B are skew-symmetric, then [A, B] is skew-symmetric: 


ee) = (AB) = (8A) BA — Ata — CB)CA)— CAB) =-A, BI. 
The definition of an abstract Lie algebra includes a bracket operation. 


Definition 9.6.7 A Lie algebra V is a real vector space together with a law of composition 
Vx V — V denoted by v, w~»[v, w] and called the bracket, which satisfies these axioms 
for all u, v, win V and all cin R: 


bilinearity: [v, + v2, w] =[v1, w])+[v2,w] and [cv, w] =clv, w], 
[v, wy + w2] =[v, wi] +[v, w2] and [v, cw] =c[v, w], 
skew symmetry: [v,w]=-[w,v], or [v, v] =0, 
Jacobi identity: [u,[v, w]] + [v, [w, u]] + [w, [u, v]] =0. 


Lie algebras are useful because, being vector spaces, they are easier to work with 
than linear groups. And, though this is not easy to prove, many linear groups, including the 
classical groups, are nearly determined by their Lie algebras. 


9.7 TRANSLATION IN A GROUP 


Let P be an element of a matrix group G. Left multiplication by P is a bijective map from G 
to itself: BS 
P 
G 
(9.7.1) ald 


X~ PX. 


Its inverse function is left multiplication by P|. The maps mp and mp,-: are continuous 
because matrix multiplication is continuous. Thus mp is a homeomorphism from G to G 
(not a homomorphism). It is also called left translation by P, in analogy with translation in 
the plane, which is left translation in the additive group Rt, 

The important property of a group that is implied by the existence of these maps is 
homogeneity. Multiplication by P is a homeomorphism that carries the identity element / 
to P. Intuitively, the group looks the same at P as it does at /, and since P is arbitrary, it 
looks the same at any two points. This is analogous to the fact that the plane looks the same 
everywhere. 

Left multiplication in the circle group SO? rotates the circle, and left multiplication 
in SU> is also a rigid motion of the 3-sphere. But homogeneity is weaker in other matrix 
groups. For example, let G be the group of real invertible diagonal 2 x 2 matrices. If we 
identify the elements of G with the points (a, d) in the plane and not on the coordinate axes, 
multiplication by the matrix 


(9.7.2) P= E | 


distorts the group G, but it does this continuously. 
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2 — 


(9.7.3) Left Multiplication in a Group. 


Now the only geometrically reasonable subsets of R* that have such a homogeneity 
property are manifolds. A manifold M of dimension d is a set in which every point has a 
neighborhood that is homeomorphic to an open set in R@ (see [Munkres, p. 225]). It isn’t 
surprising that the classical groups are manifolds, though there are subgroups of GL, that 
aren’t. The group GL, (Q) of invertible matrices with rational coefficients is an interesting 
group, but it is a countable dense subset of the space of matrices. 

The following theorem gives a satisfactory answer to the question of which linear 
groups are manifolds: 


Theorem 9.7.4 A subgroup of GL, that is a closed subset of GL, is a manifold. 


Proving this theorem here would take us too far afield, but we illustrate it by showing 
that the orthogonal groups are manifolds. Proofs for the other classical groups are similar. 


Lemma 9.7.5 The matrix exponential A ~» e4 maps a small neighborhood U of 0 in R””” 
homeomorphically to a neighborhood V of Jin GL, (R). 


The fact that the exponential series converges uniformly on bounded sets of matrices implies 
that it is a continuous function ({Rudin] Thm 7.12). To prove the lemma, one needs to show 
that it has a continuous inverse function for matrices sufficiently near to 7. This can be proved 
using the inverse function theorem, or the series for log(1 + x): 


(9.7.6) log(L+x) =x —-—5x? +423... 


The series log(/ + B) converges for small matrices B, and it inverts the exponential. O 


(9.7.7) The Matrix Exponential. 
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Proposition 9.7.8 The orthogonal group O, is a manifold of dimension 5n(n = DF 


Proof. We denote the group O, by G, and its Lie algebra, the space of skew-symmetric 
matrices, by L. If A is skew-symmetric, then e“ is orthogonal (9.5.8). So the exponential 
maps L to G. Conversely, suppose that A is near 0. Then, denoting transpose by +, A* and 
-A are also near zero, and e” and e~4 are near to J. If e4 is orthogonal, ie., if e4”° = eA, 
Lemma (9.7.5) tells us that A* = -A, so A is skew-symmetric. Therefore a matrix A near 0 
is in L if and only if e4 is in G. This shows that the exponential defines a homeomorphism 
from a neighborhood V of 0 in L to a neighborhood U of J in G. Since L is a vector space, 
it is a manifold. The condition for a manifold is satisfied by the orthogonal group at the 
identity. Homogeneity implies that it is satisfied at all points. Therefore G is a manifold, and 
its dimension is the same as that of L, namely sn(n —1). 0 


Here is another application of the principle of homogeneity. 


Proposition 9.7.9 Let G be a path-connected matrix group, and let H be a subgroup of G 
that contains a nonempty open subset U of G. Then H = G. 


Proof. A subset of R” is path connected if any two points of S can be joined by a continuous 
path lying entirely in S (see [Munkres, p. 155] or Chapter 2, Exercise M.6). 

Since left multiplication by an element g is a homeomorphism from G to G, the set 
gU is also open, and it is contained in a single coset of H, namely in gH. Since the translates 
of U cover G, the ones contained in a coset C cover that coset. So each coset is a union 
of open subsets of G, and therefore is open itself. Then G is partitioned into open subsets, 
the cosets of H. A path-connected set is not a disjoint union of proper open subsets (see 
[Munkres, p. 155]). Thus there can be only one coset, and H = G. 0 


We use this proposition to determine the normal subgroups of SU. 


Theorem 9.7.10 


(a) The only proper normal subgroup of SU? is its center { +/}. 
(b) The rotation group SO3 is a simple group. 


Proof. (a) Let N be a normal subgroup of SU? that contains an element P# +/. We must 
show that N is equal to SU}. Since N is normal, it contains the conjugacy class C of P, which 
is a latitude, a 2-sphere. 

We choose a continuous map P(f) from the unit interval [0, 1] to C such that P(0) = P 
and P(1) +P, and we form the path Q(t) = P()P"!. Then Q(0) = J, and Q(1) 41, so this 
path leads out from the identity /, as in the figure below. Since N is a group, that contains 
P and P(t), it also contains Q(£) for every ¢ in the interval [0, 1]. We don’t need to know 
anything else about the path Q(?). 

We note that trace Q < 2 for any Qin SU, and that / is the only matrix with trace equal 
to 2. Therefore trace Q(0) = 2 and trace Q(1) = t < 2. By continuity, all values between t 
and 2 are taken on by trace Q(t). Since N is normal, it contains the conjugacy class of O(t) 
for every t. Therefore N contains all elements of SU > whose traces are sufficiently near to 2, 
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and this includes all matrices near to the identity. So N contains an open neighborhood of 
the identity in SU. Since SU2 is path-connected, Proposition 9.7.9 shows that N = SU2. 


(b) There is a surjective map @: SU2 — SO3 whose kernel is { + J} (9.4.1). By the 
Correspondence Theorem 2.10.5, the inverse image of a normal subgroup in SO3 is a normal 
subgroup of SU2 that contains { +/}. Part (a) tells us that there are no proper subgroups of 
SU? except {+1}, so SO3 contains no proper normal subgroup at all. C) 


One can apply translation in a group G to tangent vectors too. If A is a tangent vector 
at the identity and if P is an element of G, the vector PA is tangent to G at P, and if A isn’t 
zero, neither is PA. As P ranges over the group, the family of these vectors forms what is 
called a tangent vector field. Now just the existence of a continuous tangent vector field that is 
nowhere zero puts strong restrictions on the space G. It is a theorem of topology, sometimes 
called the ‘“‘Hairy Ball Theorem,” that any tangent vector field on the 2-sphere must vanish 
at some point (see [Milnor]). This is one reason that the 2-sphere has no group structure. 
But since the 3-sphere is a group, it has tangent vector fields that are nowhere zero. 


9.8 NORMAL SUBGROUPS OF SL2 


Let F be a field. The center of the group SL2(F) is {+/}. (This is Exercise 8.5.) The quotient 
group SL2(F)/{+J} is called the projective group, and is denoted by PSL2(F). Its elements 
are the cosets { + P}. 


Theorem 9.8.1 Let F be a field of order at least four. 


(a) The only proper normal subgroup of SL2(F) is its center Z = {+]}. 
(b) The projective group PSL2(F) is a simple group. 


Part (b) of the theorem follows from (a) and the Correspondence Theorem 2.10.5, 
and it identifies an interesting class of finite simple groups: the projective groups PSL2(F) 
when Fis a finite field. The other finite, nonabelian simple groups that we have seen are the 
alternating groups (7.5.4). 

We will show in Chapter 15 that the order of a finite field is always a power of a 
prime, that for every prime power g = p*®, there is a field F, of order q, and that Fg has 
characteristic p (Theorem 15.7.3). Finite fields of order 2° have characteristic 2. In those 
fields, 1 = -1 and J = -J. Then the center of SL2(F,) is the trivial group. Let’s assume these 
facts for now. 
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We omit the proof of the next lemma. (See Chapter 3, Exercise 4.4 for the case that g 
is a prime.) 


Lemma 9.8.2 Let g be a power of a prime. The order of SL2(Fq) is q° —q. lfq is not a power 
of 2, the order of PSL7(Fq) is $(q? — q). If q is a power of 2, then PSL2(Fq) © SL2(Fq), 
and the order of PSL2(F,) is q° — q. O 


The orders of PSL for small q are listed below, along with the orders of the first three 
simple alternating groups. 


|F| i ee) ae ewe 3 13 16 i 19 
|PSL2| 60 60 168 504 360 660 1092 4080 2448 3420 


n 5 6 7 
|An| 60 360 2520 


The orders of the ten smallest nonabelian simple groups appear in this list. The next smallest 
would be PSL3(F3), which has order 5616. 

The projective group is not simple when | F'| = 2 or 3. PSL2(F2) is isomorphic to the 
symmetric group $3 and PSL2(F3) is isomorphic to the alternating group Aq. 

As shown in these tables, PSL2(F4), PSL2(Fs), and As have order 60. These three 
groups happen to be isomorphic. (This is Exercise 8.3.) The other coincidences among orders 
are the groups PSL2(F9) and A¢, which have order 360. They are isomorphic too. 0 


For the proof, we will leave the cases |F'| = 4 and 5 aside, so that we can use the next 
lemma. 


Lemma 9.8.3 A field F of order greater than 5 contains an element r whose square is not 
0,1, or -1. 


Proof. The only element with square 0 is 0, and the elements with square 1 are +1. There 
are at most two elements whose squares are -1: If a? = b* = -1, then (a— b)(a+b) =0,s0 
p= a: O 


Proof of Theorem 9.8.1. We assume given the field F’, we let SL2 and PSL» stand for 
SL2(F) and PSL2(F), respectively, and we denote the space F’ * by V. We choose a nonzero 
element r of F whose square s is not +1. 

Let N be a normal subgroup of SZ2 that contains an element A# +/. We must show 
that N is the whole group SL. Since A is arbitrary, it is hard to work with directly. The 
strategy is to begin by showing that N contains a matrix that has eigenvalue s. 


Step 1: There is a matrix P in SL2 such that the commutator B = APA! P™! isin N, and has 
eigenvalues s and s“!. 


This is a nice trick. We choose a vector v; in V that is not an eigenvector of A and we 
let v> = Av}. Then vj, and v2 are independent, so B = (v}, v2) is a basis of V. (It is easy to 
check that the only matrices in SL> for which every vector is an eigenvector are J and -/.) 

Let R be the diagonal matrix with diagonal entries r and r). The matrix P = [B]R[B] ! 
has determinant 1, and v; and v2 are eigenvectors, with eigenvalues r and r-', respectively 
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(4.6.10). Because N is a normal subgroup, the commutator B = APA™'P™ is an element of 
N (see (7.5.4)). Then 


Bv2 = APA !P"!y2 = APA | (rvz) = rAPv; = PA, — LE 


Therefore s is an eigenvalue of B. Because det B = 1, the other eigenvalue is Sie 


Step 2: The matrices having eigenvalues s and s~! form a single conjugacy class C in SL2, 
and this conjugacy class is contained in N. 


The elements s and s7! are distinct because s# +1. Let S be a diagonal matrix with 
diagonal entries s and s"!. Every matrix Q with eigenvalues s and s-! is a conjugate of S$ in 
GL2(F) (4.4.8)(b), say Q = LSL™!. Since S is diagonal, it commutes with any other diagonal 
matrix. We can multiply L on the right by a suitable diagonal matrix, to make det L = 1, 
while preserving the equation Q = LSL™'. So Q is a conjugate of S in SL7. This shows that 
the matrices with eigenvalues s and s~! form a single conjugacy class. By Step 1, the normal 
subgroup N contains one such matrix. SoC C N. 


Step 3: The elementary matrices E = E < and Et = E al with x in Ff, are in N. 


For any element x of F,, the terms on the left side of the equation 


ey | Pe 
Os [Oris eed 
are in C and in N, so Eis in N. One sees similarly that E' is in N. 


Step 4: The matrices E and E’, with x in F, generate SL2. Therefore N = SL. 


The proof of this is Exercise 4.8 of Chapter 2 . O 


As is shown by the alternating groups and the projective groups, simple groups arise 
frequently, and this is one of the reasons that they have been studied intensively. On the 
other hand, simplicity is a very strong restriction on a group. There couldn’t be too many of 
them. A famous theorem of Cartan is one manifestation of this. 

A complex algebraic group is a subgroup of the complex general linear group GL, (C) 
which is the locus of complex solutions of a finite system of complex polynomial equations 
in the matrix entries. Cartan’s theorem lists the simple complex algebraic groups. In the 
statement of the theorem, we use the symbol Z to denote the center of a group. 


Theorem 9.8.4 


(a) The centers of the groups SL, (C), SOn(C), and SP 2, (C) are finite cyclic groups. 


(b) For n > 1, the groups SL, (Cy Ze SOn(C)/Z, and SP2,(C)/Z are path-connected 
complex algebraic groups. Except for SO2(C)/Z and SO4(C)/Z, they are simple. 
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(c) In addition to the isomorphism classes of these groups, there are exactly five isomorphism 


classes of simple, path-connected complex algebraic groups, called the exceptional 
groups. 


Theorem 9.8.4 is based on a classification of the corresponding Lie algebras. It is too hard to 
prove here. 


A large project, the classification of the finite simple groups, was completed in 1980. 
The finite simple groups we have seen are the groups of prime order, the alternating groups 
An with n > S, and the groups PSL2(F) when F is a finite field of order at least 4. Matrix 
groups play a dominant role in the classification of the finite simple groups too. Each of the 
forms (9.8.4) leads to a whole series of finite simpie groups when finite fields are substituted 
for the complex field. There are also some finite simple groups analogous to the unitary 
groups. All of these finite linear groups are said to be of Lie type. In addition to the groups 
of prime order, the alternating groups, and the groups of Lie type, there are 26 finite simple 
groups called the sporadic groups. The smallest sporadic group is the Mathieu group M,,, 
whose order is 7920. The largest, the Monster, has order roughly 10°°. 


It seems unfair to crow about the successes of a theory 
and to sweep all its failures under the rug. 


—Richard Brauer 


EXERCISES 


Section1 The Classical Linear Groups 
1.1. (a) Is GL,,(C) isomorphic to a subgroup of GL2,(R)? 
(b) Is SO2(C) a bounded subset of C?**? 


1.2. A matrix P is orthogonal if and only if its columns form an orthonormal basis. Describe 
the properties of the columns of a matrix in the Lorentz group O34. 


1.3. Prove that there is no continuous isomorphism from the orthogonal group Q4 to the 
Lorentz group 03,1. 


1.4. Describe by equations the group O;,; and show that it has four path-connected 
components. 


1.5. Prove that SP, = SL2, but that SPy4 SL4. 
1.6. Prove that the following matrices are symplectic, if the blocks aren Xn: 


F “| ; |“ «| : ! “a where B = B' and A is invertible. 


*1.7. Prove that 


(a) the symplectic group SP, operates transitively on Ro, 
(b) SP», is path-connected, (c) symplectic matrices have determinant 1. 
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Section 2 Interlude: Spheres 


Duke 
2.2. 


Zese 


Compute the formula for the inverse of the stereographic projection :S* > R°. 


One can parametrize proper subspaces of R? by a circle in two ways. First, if a subspace 
W intersects the horizontal axis with angle 9, one can use the double angle a = 26. The 
double angle eliminates the ambiguity between 0 and 6+ z. Or, one can choose a nonzero 
vector (y;, y2) in W, and use the inverse of stereographic projection to map the slope 
A = y2/y; to a point of S'. Compare these two parametrizations. 


(unit vectors and subspaces in C?) A proper subspace W of the vector space C? has 
dimension 1. Its slope is defined to be A = y2/y , where (y1, y2) is a nonzero vector in 
W. The slope can be any complex number, or when y; = 0, A = oo. 


(a) Let z = vj + v2i. Write the formula for stereographic projection zr (9.2.2) and its 
inverse function o in terms of z. 

(b) The function that sends a unit vector (y;, y2) to o(2/y;) defines a map from the 
unit sphere S* in C? to the two-sphere S*. This map can be used to parametrize 
subspaces by points of S*. Compute the function o(y2/y,) on unit vectors (1, y2). 

(c) What pairs of points of S* correspond to pairs of subspaces W and W’ that are 
orthogonal with respect to the standard Hermitian form on C?? 


Section3 The Special Unitary Group SU2 


m9 


3.2. 
33: 
3.4. 


Let P and Q be elements of SU2, represented by the real vectors (xo, x1, x2, x3) 
and (yo, 1, y2, y3), respectively. Compute the real vector that corresponds to the 
product PQ. 


Prove that U> is homeomorphic to the product S* x S!. 
Prove that every great circle in SU2 (circle of radius L) is a coset of one of the longitudes. 


Determine the centralizer of j in SU}. 


Section4 The Rotation Group SO; 


4.1. 


4.2. 


43. 


4.4. 


4.5. 
4.6. 


Let W be the space of real skew-symmetric 3 X 3 matrices. Describe the orbits for the 
operation P x A = PAP’ of SO3 on W. 


The rotation group SO3 may be mapped to a 2-sphere by sending a rotation matrix to its 
first column. Describe the fibres of this map. 


Extend the orthogonal representation g: SUz > SO3 to a homomorphism 
®:U, — SO3, and describe the kernel of ®. 


(a) With notation as in (9.4.1), compute the matrix of the rotation yp, and show that its 
trace is 1 + 2.cos 20. 


(b) Prove directly that the matrix is orthogonal. 
Prove that conjugation by an element of SU) rotates every latitude. 


Describe the conjugacy classes in SO3 in two ways: 


(a) Its elements operate on R° as rotations. Which rotations make up a conjugacy 
class? 


4.7. 


4.8. 


*4,9, 
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(b) The spin homomorphism SU, — SO3 can be used to relate the conjugacy classes in 
the two groups. Do this. 


(c) The conjugacy classes in SU2 are spheres. Describe the conjugacy classes in SO3 
geometrically. Be careful. 


(a) Calculate left multiplication by a fixed matrix P in SU> explicitly, in terms of the 
coordinate vector (x, x1, x2, x3). Prove that it is given as multiplication by a 4 x 4 
orthogonal matrix Q. 

(b) Prove that Q is orthogonal by a method similar to that used in describing the 
orthogonal representation: Express dot product of the vectors (xg, x1, x2, x3) and 
(x9, x}, X5, x3) that correspond to matrices P and P’ in SU), in matrix terms. 


Let W be the real vector space of Hermitian 2 x 2 matrices. 


(a) Prove that the rule P. A = PAP* defines an operation of SL2(C) on W. 

(b) Prove that the function (A, A’) = det(A + A’) — det A — dei A’ is a bilinear form on 
W, and that its signature is (3, 1). 

(c) Use (a) and (b) to define a homomorphism gy: SL7(C) > O3,;, whose kernel is {+/}. 


(a) Let H; be the subgroup of SO3 of rotations about the x;-axis, i = 1, 2, 3. Prove that 
every element of SO3 can be written as a product ABA’, where A and A’ are in Hy and 
B is in Ap. Prove that this representation is unique unless B = J. 

(b) Describe the double cosets H;QH geometrically (see Chapter 2, Exercise M.9). 


Section5 One-Parameter Groups 


Sele 
5.2. 
Ss 


5.4. 


55. 


5.6. 


So. 


5.8. 


Can the image of a one-parameter group in GL», cross itself? 
Determine the one-parameter groups in U2. 


Describe by equations the images of the one-parameter groups in the group of real, 
invertible, 2 X 2 diagonal matrices, and make a drawing showing some of them in the 
plane. 


Find the conditions on a matrix A so that e’4 is a one-parameter group in 


(a) the special unitary group SU, (b) the Lorentz group O31. 


Let G be the group of real matrices of the form [* z | with x > 0. 


(a) Determine the matrices A such that e is a one-parameter group in G. 
(b) Compute e’4 explicitly for the matrices in (a). 
(c) Make a drawing showing some one-parameter groups in the (x, y)-plane. 


Let G be the subgroup of GL2 of matrices [* an with x > 0 and y arbitrary. 


Determine the conjugacy classes in G, and the matrices A such that @ is a one- 
parameter group in G. 

Determine the one-parameter groups in the group of invertible m Xn upper triangular 
matrices. 

Let v(t) = e’4 be a one-parameter group in a subgroup G of GL. Prove that the cosets 
of its image are matrix solutions of the differential equation dX /dt = AX. 
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ee 


5.10. 


Let y:R*+ + GL, be a one-parameter group. Prove that kerg is either trivial, or an 
infinite cyclic group, or the whole group. 


Determine the differentiable homomorphisms from the circle group SO2 to GLy. 


Section6 The Lie Algebra 


6.1. 
6.2. 


6.3. 


6.4. 


6.5. 


6.6. 


6.7. 


6.8. 


6.9. 


6.10. 
6.11. 


Verify the Jacobi identity for the bracket operation [A, B] = AB — BA. 


Let V be a real vector space of dimension 2, with a law of composition [v, w] that is 
bilinear and skew-symmetric (see (9.6.7)). Prove that the Jacobi identity holds. 


The group SL» operates by conjugation on the space of trace-zero matrices. Decompose 
this space into orbits. 


Let G be the group of invertible real matrices of the form rE i | Determine the Lie 


algebra L of G, and compute the bracket on L. 


Show that the set defined by xy = 1 is a subgroup of the group of invertible diagonal 2 x2 
matrices, and compute its Lie algebra. 


(a) Show that O2 operates by conjugation on its Lie algebra. 

(b) Show that this operation is compatible with the bilinear form (A, B) = i trace AB. 

(c) Use the operation to define a homomorphism O2 — Q), and describe this homo- 
morphism explicitly. 

Determine the Lie algebras of the following groups. 

(a) Un, (b) SUn, (€) O31, (d) SO, (C). 

A!B 

C| Dy 

(a) Show that the vector cross product makes R? into a Lie algebra Lj. 


(b) Let Lz = Lie(SU2), and let L3 = Lie(S$O3). Prove that the three Lie algebras 
L,, L2 and L3 are isomorphic. 


Determine the Lie algebra of SP>,,, using block form M = 


Classify complex Lie algebras of dimension < 3. 


Let B be a real n Xn matrix, and let ( , ) be the bilinear form X’BY. The orthogonal 
group G of this form is defined to be the group of matrices P such that P’BP = B. 
Determine the one-parameter groups in G, and the Lie algebra of G. 


Section 7 Translation in a Group 


tae 
T 20 


Teds 
7.4, 


Prove that the unitary group U,, is path connected. 

Determine the dimensions of the following groups: 

(a) Un, (b) SUn, (©) SOn(C), (A) O31, (€) SPon. 

Using the exponential, find all solutions near J of the equation P* = J. 


Find a path-connected, nonabelian subgroup of G L2 of dimension 2. 


BAe 


7.6. 
Teds 
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(a) Prove that the exponential map defines a bijection between the set of all Hermitian 
matrices and the set of positive definite Hermitian matrices. 


(b) Describe the topological structure of GL2(C) using the Polar decomposition 
(Chapter 8, Exercise M.8) and (a). 


Sketch the tangent vector field PA to the group C*, when A = 1 +i. 


Let H be a finite normal subgroup of a path connected group G. Prove that H is contained 
in the center of G. 


Section 8 Normal Subgroups of SL2 


8.1. 
8.2. 
8.3. 


8.4. 


8.5. 
8.6. 
8.7. 


8.8. 


8.9. 


*8.10. 


Prove Theorem 9.8.1 for the cases F = F, and Fs. 

Describe isomorphisms PSL>2(F2) ~ $3 and PSL2(F3) © Aq. 

(a) Determine the numbers of Sylow p-subgroups of PSL2(Fs), for p = 2, 3, 5. 
(b) Prove that the three groups As, PSL2(F4), and PSL2(Fs) are isomorphic. 
(a) Write the polynomial equations that define the symplectic group. 


(b) Show that the unitary group U,, can be defined by real polynomial equations in the 
real and imaginary parts of the matrix entries. 


Determine the centers of the groups SL, (R) and SL, (C). 
Determine all normal subgroups of G L2(R) that contain its center. 
With Z denoting the center of a group, is PSL,(C) isomorphic to GLy,(C)/Z? Is 
PSL, (R) isomorphic to GL, (R)/Z? 
(a) Let P be a matrix in the center of SO,, and let A be a skew-symmetric matrix. Prove 
that PA = AP. 
(b) Prove that the center of SO, is trivial if m is odd and is {+/} ifm is even andn > 4. 
Compute the orders of the groups 
(a) SO2(F3), (b) SO3(F3), (©) SO2(Fs), (d) SO3(Fs). 
(a) Let V be the space V of complex 2 x 2 matrices, with the basis (€11, €12, €21, €22). 


Write the matrix of conjugation by A = E Al on V in block form. 


(b) Prove that conjugation defines a homomorphism g: SL2(C) > GL4(C), and that 
the image of ¢ is isomorphic to PSL2(C). 

(c) Prove that PSL2(C) is a complex algebraic group by finding polynomial equations 
in the entries y;; of a 4X4 matrix whose solutions are the matrices in the image of ¢. 


Miscellaneous Exercises 


M.1. 


*M.2. 
M.3. 


Let G = SL2(R), let A = ‘ e be a matrix in G, and let f be its trace. Substituting 


t — x for w, the condition det A = 1 becomes x(t — x) — yz = 1. For fixed trace t, the 
locus of solutions of this equation is a quadric in x, y, z-space. Describe the quadrics that 
arise this way, and decompose them into conjugacy classes. 


Which elements of SL2(R) lie on a one-parameter group? 
Are the conjugacy classes in a path connected group G path connected? 
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M.4. 


M.S. 


M.6. 


M.8. 


M.9. 


Quaternions are expressions of the form a = a + bi + cj + dk, where a, b, c, d are real 
numbers (see (9.3.3)). 


(a) Let a =a — bi— cj — dk. Compute aa. 
(b) Prove that every ~#0 has a multiplicative inverse. 


(c) Prove that the set of quaternions w such that a? + b* + c? + d* = 1 forms a group 
under multiplication that is isomorphic to SU}. 


The affine group A, is the group of transformations of R” generated by GL, and the 
group 7; of translations: tg(x) = x +a. Prove that JT, is a normal subgroup of Ay and 
that A,,/T;, is isomorphic to GLy. 


(Cayley transform) Let U denote the set of matrices A such that J + A is invertible, and 
define A’ = (I— A) + A). 


(a) Prove that if A is in U, then so is A’, and that (A’)’ = A. 

(b) Let V denote the vector space of real skew-symmetric 1 Xn matrices, Prove that the 
rule A ~» (I — A)(1 + A)7! defines a homeomorphism from a neighborhood of 0 in 
V to a neighborhood of Jin SO, . 

(c) Is there an analogous statement for the unitary group? 


(d) Lets =| ot 


10 | Show that a matrix A in U is symplectic if and only if (A’)’S = -SA’. 


. Let G = SL. A ray in R? is a half line leading from the origin to infinity. The rays are in 


bijective correspondence with the points on the unit 1-sphere in R?. 


(a) Determine the stabilizer H of the ray {re;|r > 0}. 

(b) Prove that the map f: Hx SO 2 — G defined by f(P, B) = PB is ahomeomorphism 
(not a homomorphism). 

(c) Use (b) to identify the topological structure of SL. 


Two-dimensional space-time is the space of real three-dimensional column vectors, with 
the Lorentz form (Y, Y’) = Y'In1Y’ = y1y, + yoy5 — y3y3. 
The space W of real trace-zero 2 X 2 matrices has a basis B = (w,, w2, w3), where 


o-[ akeb ferLs 4] 


(a) Show that if A = BY and A’ = BY’ are trace-zero matrices, the Lorentz form carries 
over to (A, A’) = yy, + Y2¥y — Y3Y3 = 4 trace(AA’). 

(b) The group SL2 operates by conjugation on the space W. Use this operation to define 
a homomorphism y: SL2 + O2,; whose kernel is {+ /}. 


*(c) Prove that the Lorentz group O2,; has four connected components and that the 
image of ¢ is the component that contains the identity. 


The icosahedral group is a subgroup of index 2 in the group G, of all symmetries of 
a dodecahedron, including orientation-reversing symmetries. The alternating group As; 
is a subgroup of index 2 of the symmetric group G2 = Ss. Finally, consider the spin 
homomorphism ~: SU; + SO3. Let G3 be the inverse image of the icosahedral group in 
SU>. Are any of the groups G; isomorphic? 


*M.10. 


*M.11. 


*M.12. 


*M.13. 
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Let P be the matrix (9.3.1) in SU2, and let T denote the subgroup of SU2 of diagonal 
matrices. Prove that if the entries a, b of P are not zero, then the double coset TPT 


is homeomorphic to a torus, and describe the remaining double cosets (see Chapter 2, 
Exercise M.9). 


The adjoint representation of a linear group G is the representation by conjugation on its 
Lie algebra: G x L — L defined by P, A ~» PAP™!. The form (A, A’) = trace(AA’) on 
L is called the Killing form. For the following groups, verify that if P is in G and A is in 
L, then PAP"! is in L. Prove that the Killing form is symmetric and bilinear and that the 
operation is compatible with the form, i.e., that (A, A) = (PAP"!, PA’P™'). 

(a) Un, (b) O31, (€) SOn(C), (d) SPon. 
Determine the signature of the Killing form (Exercise M.11) on the Lie algebra of 

(a) SUn, (b) SOn, (ce) SLn. 


Use the adjoint representation of SL2(C) (Exercise M.11) to define an isomorphism 
SL2(C)/{ +I} ~ SO3(C). 
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Group Representations 


A tremendous effort has been made by mathematicians 
for more than a century to clear up the chaos in group theory. 
Still, we cannot answer some of the simplest questions. 


—Richard Brauer 


Group representations arise in mathematics and in other sciences when a structure with 
symmetry is being studied. If one makes all possible measurements of some sort (in 
chemistry, it might be vibrations of a molecule) and assembles the results into a ‘“‘state 
vector, a symmetry of the structure will transform that vector. This produces an operation 
of the symmetry group on the space of vectors, a representation of the group, that can help 
to analyze the structure. 


10.1 DEFINITIONS 


In this chapter, GL,, denotes the complex general linear group GL, (C). 
A matrix representation of a group G is a homomorphism 


(10.1.1) RG =9Giies 


from G to one of the complex general linear groups. The number n is the dimension of the 
representation. 

We use the notation Rg instead of R(g) for the image of a group element g. Each Rg 
is an invertible matrix, and the statement that R is a homomorphism reads 


(10822) Ren = RgRp. 
If a group is given by generators and relations, say (x,,...,%n|11,...,7,), a matrix 
representation can be defined by assigning matrices R,x,,..., Rx, that satisfy the relations. 


For example. the symmetric group S3 can be presented as (x, y|x°, y*, xyxy), so a 
representation of S3 is defined by matrices Ry, and Ry such that re — Re = J, and 
R,yRyR,R, = I. Some relations in addition to these required ones may hold. 

Because $3 is isomorphic to the dihedral group D3, it has a two-dimensional matrix 
representation that we denote by A. We place an equilateral triangle with its center at 
the origin, and so that one vertex is on the e;-axis. Then its group of symmetries will be 
generated by the rotation A, with angle 27r/3 and the reflection Ay about the e)-axis. With 
C='c0s 27/3 and s = sim 277/ 5, 
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(10.1.3) Ac=|< eit Ay=| al 


We call this the standard representation of the dihedral group D3 and of $3. 


¢ A representation R is faithful if the homomorphism R:G > GL, is injective, and there- 
fore maps G isomorphically to its image, a subgroup of GL,. The standard representation 
of $3 is faithful. 


Our second representation of S3 is the one-dimensional sign representation ©. Its value 
on a group element is the 1X1 matrix whose entry is the sign of the permutation: 


(10.1.4) Beets, “yam lle 


This is not a faithful representation. 
Finally, every group has the trivial representation, the one-dimensional representation 
that takes the value 1 identically: 


(10.1.5) Tr =[1], Ty =[1]. 


There are other representations of $3, including the representation by permutation 
matrices and the representation as a group of rotations of R*. But we shall see that every 
representation of this group can be built up out of the three representations A, X, and T. 


Because they involve several matrices, each of which may have many entries, repre- 
‘sentations are notationally complicated. The secret to understanding them is to throw out 
most of the information that the matrices contain, keeping only one essential part, its trace, 
or character. 


e The character xr of a matrix representation R is the complex-valued function whose 
domain is the group G, defined by xr(g) = trace Rg. 


Characters are usually denoted by x (‘chi’). The characters of the three representations 
of the symmetric group that we have defined are displayed below in tabular form, with the 
group elements listed in their usual order. 


(10.1.6) 


Several interesting phenomena can be observed in this table: 


° The rows form orthogonal vectors of length equal to six, which is also the order of S3. The 
columns are orthogonal too. 


These astonishing facts illustrate the beautiful Main Theorem 10.4.6 on characters. 
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Two other phenomena are more elementary: 
© xR(1) is the dimension of the representation, also called the dimension of the character. 


Since a representation is a homomorphism, it sends the identity in the group to the identity 
matrix. So xr(1) is the trace of the identity matrix. 


e The characters are constant on conjugacy classes. 


(The conjugacy classes in 53 are the sets {1}, {x, x7}, and {y, es xeay}.) 

This Phoneme is explained as follows: Let g and g’ be conjugate elements of a 
group G, say g’ =h gh"! . Because a representation R is a homomorphism, Ry = R,RgR;!. 
So Ry and Rg are conjugate matrices. Conjugate matrices have the same trace. 


It is essential to work as much as possible without fixing a basis, and to facilitate this, 
we introduce the concept of a representation of a group on a vector space V. We denote by 


(IGE?) GL(V) 


the group of invertible linear operators on V, the law of composition being composition of 
operators. We always assume that V is a finite-dimensional complex vector space, and not 
the zero space. 


¢ A representation of a group G on a complex vector space V is a homomorphism 
(10.1.8) p:G > GL(YV). 


So a representation assigns a linear operator to every group element. A matrix representation 
can be thought of as a representation of G on the space of column vectors. 

The elements of a finite rotation group (6.12) are rotations of a three-dimensional 
Euclidean space V without reference to a basis, and these orthogonal operators give us what 
we call the standard representation of the group. (We use this term in spite of the fact that, 
for D3, it conflicts with (10.1.3).) We also use the symbol for other representations, and 
this will not imply that the operators pg are rotations. 


If p is a representation, we denote the image of an element g in GL(V) by pg rather 
than by p(g), to keep the symbol g out of the way. The result of applying pg to a vector v 
will be written as 


Pg(v) oras Pguv. 
Since ¢ is a homomorphism, 


(10.1.9) Pgh = PgPh- 


The choice of a basis B = (vj, ..., Un) for a vector space V defines an isomorphism 
from GL(V) to the general linear group GLy: 


GL(V) > GLy 


(10.1.10) 
T -~ matrix of T, 


and a representation p defines a matrix representation R, by the rule 


05 Pg ~its matrix = Ry. 
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Thus every representation of G on a finite-dimensional vector space can be made into a 
matrix representation, if we are willing to choose a basis. We may want to choose a basis in 


order to make explicit calculations, but we must determine which properties are independent 
of the basis, and which bases are the good ones. 


A change of basis in V by a matrix P changes the matrix representation R associated 
to p to a conjugate representation R' = P™' RP, i.e., 


-1 
(10.1.12) Re = PRP, 
with the same P for every g in G. This follows from Rule 4.3.5 for a change of basis. 


¢ An operation of a group G by linear operators on a vector space V is an operation on the 
underlying set: 


10:1.13) lv=v and (gh)v= g(hv), 


and in addition every group element acts as a linear operator. Writing out what this means, 
we obtain the rules 


(10.1.14) g(v+v)=gv+gu and g(cv) =cev, 


which, when added to (10.1.13), give a complete list of axioms for such an operation. We can 
speak of orbits and stabilizers as before. 

The two concepts “‘operation by linear operators on V” and “representation on V”’ 
are equivalent. Given a representation p of G on V, we can define an operation of G on 
V by 


(OMENS) gu = Pg(v). 


Conversely, given an operation, the same formula can be used to define the opeiator pg. 

We now have two notations (10.1.15) for the operation of g on vu, and we use them 
interchangeably. The notation gv is more compact, so we use it when possible, though it is 
ambiguous because it doesn’t specify p. 


e An isomorphism from one representation 9: G — GL(V) of a group G to another 
representation 0’: G > GL(V’) is an isomorphism of vector spaces T: V > V’, an 
invertible linear transformation, that is compatible with the operations of G: 


(10.1.16) T(gv) = gT(v) 


for all v in V and all g in G. If T: V > V’ is an isomorphism, and if B and B’ are 
corresponding bases of V and V’, the associated matrix representations Rg and R,, will be 
equal. 


The main topic of the chapter is the determination of the isomorphism classes 
of complex representations of a group G, representations on finite-dimensional, nonzero 
complex vector spaces. Any real matrix representation, such as one of the representations of 
S3 described above, can be used to define a complex representation, simply by interpreting 
the real matrices as complex matrices. We will do this without further comment. And except 
in the last section, our groups will be finite. 
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10.2 IRREDUCIBLE REPRESENTATIONS 


Let p be a representation of a finite group G on the (nonzero, finite-dimensional) complex 
vector space V. A vector v is G-invariant if the operation of every group element fixes the 
vector: 


(10.2.1) gu=v or pg(v) =v, forall g in G. 


Most vectors aren’t G-invariant. However, starting with any vector v, one can produce a 
G-invariant vector by averaging over the group. Averaging is an important procedure that 
will be used often. We used it once before, in Chapter 6, to find a fixed point of a finite group 
operation on the plane. The G-invariant averaged vector is 


(10.2.2) v= ai > gv. 


The reason for the normalization factor a is that, if v happens to be G-invariant itself, then 
vA, 

We verify that v is G-invariant: Since the symbol g is used in the summation (10.2.2), 
we write the condition for G-invariance as hv = v for all h in G. The proof is based on 
the fact that left multiplication by h defines a bijective map from G to itself. We make the 
substitution g’ = hg. Then as g runs through the elements of the group G, g’ does too, 
though in a different order, and 


~ 1 1 1 ~ 
(10.2.3) hi =hyy > 8¥ = ay), SY = yap eu = ds 
geG geG geG 


This reasoning can be confusing when one sees it for the first time, so we illustrate it 
by an example, with G = $3. We list the elements of the group as usual: g = 1, x, x2, 
y, xy, x?y. Leth = y. Then g’ = hg lists the group in the order g’ = y, x2 y, xy, 1, x2, x. So 


do Fv=yutxyutxyutilvt+x’v+xv= > gv 
geG geG 
The fact that multiplication by h is bijective implies that g’ will always run over the group in 
some order. Please study this reindexing trick. 
The averaging process may fail to yield an interesting vector. It is possible that v = 0. 


Next, we turn to G-invariant subspaces. 


¢ Let p be a representation of G on V. A subspace W of V is called G-invariant if gw is in 
W for all w in W and g in G. So the operation by a group element must carry W to itself: 
For all g, 


(10.2.4) SWEW,. otyunpsWiew. 


This is an extension of the concept of T-invariant subspace that was introduced in Section 4.4. 
Here we ask that W be an invariant subspace for each of the operators Dg. 


When W is G-invariant, we can restrict the operation of G to obtain a representation 
of G on W. 


Lemma 10.2.5 If W is an invariant subspace of V, then gW = W for all gin G. 
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Proof. Since group elements are invertible, their operations on V are invertible. So gW and 
W have the same dimension. If gW C W, then gW = W. O 


e If V is the direct sum of G-invariant subspaces, say V = W) © W2, the representation ¢ 
on V is called the direct sum of its restrictions to W, and W), and we write 


where a and B denote the restrictions of p to W; and W», respectively. Suppose that this 
is the case, and let B = (B;, B>) be a basis of V obtained by listing bases of W; and W)2 in 
succession. Then the matrix of (, will have the block form 


A, 0 

(QUE) Rz = E g or | : 
where Ag is the matrix of a and By is the matrix of 8, with respect to the chosen bases. The 
zeros below the block Ag reflect the fact that the operation of g does not spill vectors out of 
the subspace Wj, and the zeros above the block B, reflect the analogous fact for W2. 

Conversely, if R is a matrix representation and if all of the matrices Rg have a block 
form (10.2.7), with Ag and By square, we say that the matrix representation R is the direct 
sum A ® B. 

For example, since the symmetric group $3 is isomorphic to the dihedral group D3, 
it is a rotation group, a subgroup of SO3. We choose coordinates so that x acts on R® as a 
rotation with angle 27r/3 about the e3-axis, and y acts as a rotation by 7 about the e,-axis. 
This gives us a three-dimensional matrix representation M: 


c -Ss if 
(10.2.8) 1 EN a , My= -1 ; 
il -1 


with c = cos 277/3 and s = sin 27/3. We see that M has a block decomposition, and that it is 
the direct sum A ® © of the standard representation and the sign representation. 


Even when a representation p is a direct sum, the matrix representation obtained 
using a basis will not have a block form unless the basis is compatible with the direct sum 
decomposition. Until we have made a further analysis, it may be difficult to tell that a 
representation is a direct sum, when it is presented using the wrong basis. But if we find such 
a decomposition of our representation p, we may try to decompose the summands @ and 6 
further, and we may continue until no further decomposition is possible. 


e If pis a representation of a group G on V and if V has no proper G-invariant subspace, p 
is called an irreducible representation. If V has a proper G-invariant subspace, ¢ is reducible. 


The standard representation of S3 is irreducible. 

Suppose that our representation p is reducible, and let W be a proper G-invariant 
subspace of V. Let @ be the restriction of p to W. We extend a basis of W to a basis of V, 
say B = (Wj, ..-, Wks Vet, --- Ug). The matrix of pg will have the block form 


Ag * 
(10.2.9) Rg = F 8g cal ‘ 
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where A is the matrix of a and Bg is some other matrix representation of G. I think of the 
block indicated by » as ‘‘junk.”’ Maschke’s theorem, which is below, tells us that we can get 
rid of that junk. But to do so we must choose the basis more carefully. 


Theorem 10.2.10 Maschke’s Theorem. Every representation of a finite group G on a 
nonzero, finite-dimensional complex vector space is a direct sum of irreducible representa- 
tions. 


This theorem will be proved in the next section. We’ll illustrate it here by one more 
example in which G is the symmetric group S3. We consider the representation of $3 by the 
permutation matrices that correspond to the permutations x = (123) and y = (12). Let’s 
denote this representation by N: 


60m Oo. 1-20 
(10.2.11) Nx=|1 0 0], Ny=|1 0 0 
010 OusOeaall 


There is no block decomposition for this pair of matrices. However, the vector 
w, = (1,1, 1)! is fixed by both matrices, so it is G-invariant, and the one-dimensional 
subspace W spanned by w) is also G-invariant. The restriction of N to this subspace is the 
trivial representation 7. Let’s change the standard basis of C3 to the basis B = (wy, e2, e3). 
With respect to this new basis, the representation N is changed as follows: 


ili @ © 1 ite ele) 
Te PON,P= "a0 -1|, PO ONyP a Ole oO 
He) al -1 O|;-1 1 


The upper right blocks aren’t zero, so we don’t have a decomposition of the representation 
as a direct sum. 

There is a better approach: The matrices N, and Ny are unitary, so Ng is unitary 
for all g in G. (They are orthogonal, but we are considering complex representations.) 
Unitary matrices preserve orthogonality. Since W is G-invariant, the orthogonal space Wt 
is G-invariant too (see (10.3.4)). If we form a basis by choosing vectors w2 and w3 from 
W/., the junk disappears. The permutation representation N is isomorphic to the direct sum 
T © A. We’ll soon have techniques that make verifying this extremely simple, so we won’t 
bother doing so here. 

This decomposition of the representation using orthogonal spaces illustrates a general 
method that we investigate next. 


10.3 UNITARY REPRESENTATIONS 


Let V be a Hermitian space — a complex vector space together with a positive definite 
Hermitian form (, ). A unitary operator T on V is a linear operator with the property 


(10:31) (Tv, Tojesvsw) 


for all v and w in V (8.6.3). If A is the matrix of a linear operator T with respect to an 
orthonormal basis, then T is unitary if and only if A is a unitary matrix: A* = A7!, 
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° A representation 9:G — GL(V) on a Hermitian space V is called unitary if pg is a 
unitary operator for every g. We can write this condition as 


(10.3.2) (gv, gW) = (v,W) or (Pgu, Pgw) = (v, w), 


for all v and w in V and all g in G. Similarly, a matrix representation R:G > GLy 
Is unitary it Rg is a unitary matrix for every g in G. A unitary matrix representation is a 
homomorphism from G to the unitary group: 


(10.3.3) Ri@ante. 


A representation p on a Hermitian space will be unitary if and only if the matrix represen- 
tation obtained using an orthonormal basis is unitary. 


Lemma 10.3.4 Let o be a unitary representation of G on a Hermitian space V, and let W be 
a G-invariant subspace. The orthogonal complement W~ is also G-invariant, and ¢ is the 
direct sum of its restrictions to the Hermitian spaces W and W-. These restrictions are also 
unitary representations. 


Proof. It is true that V = W ® W# (8.5.1). Since ¢ is unitary, it preserves orthogonality: If 
W is invariant and ul W, then gulgW = W. This means that if u ¢ W+, then gu « W+.0 


The next corollary follows from the lemma by induction. 


Corollary 10.3.5 Every unitary representation 90:G — GL(V) ona Hermitian vector space 
V is an orthogonal sum of irreducible representations. O 


The trick now is to turn the condition (10.3.2) for a unitary representation around, and 
think of it as a condition on the form instead of on the representation. Suppose we are given 
a representation 9:G — GL(V) ona vector space V, and let (, ) be a positive definite 
Hermitian form on V. We say that the form is G-invariant if (10.3.2) holds. This is exactly 
the same as saying that the representation is unitary, when we use the form to make V into 
a Hermitian space. But if only the representation ¢ is given, we are free to choose the form. 


Theorem 10.3.6 Let o:G -—> GL(V) bea representation of a finite group on a vector space 
V. There exists a G-invariant, positive definite Hermitian form on V. 


Proof. We begin with an arbitrary positive definite Hermitian form on V that we denote by 
{, }. For example, we may choose a basis for V and use it to transfer the standard Hermitian 
form X*Y on C” over to V. Then we use the averaging process to construct another form. 
The averaged form is defined by 


(10.3.7) (v, w) = ral > {gv, gw}. 
2EG 


We claim that this form is Hermitian, positive definite, and G-invariant. The verifications of 
these properties are easy. We omit the first two, but we will verify G-invariance. The proof 
is almost identical to the one used to show that averaging produces an G-invariant vector 
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(10.2.3), except that it is based here on the fact that right multiplication by an element h of 
G defines a bijective map G > G. 

Let h be an element of G. We must show that (hv, hw) = (v, w) for all uv and 
w in V (10.3.2). We make the substitution g’ = gh. As g runs over the group, so 
does g’. Then 


(hv, hw) = a Yitghv, ghw} = GI Y_{g'v, guy= a Y{gu, Sw) lal 
g 8 8 
Theorem 10.3.6 has remarkable consequences: 


Corollary 10.3.8 


(a) (Maschke’s Theorem): Every representation of a finite group G is a direct sum of 
irreducible representations. 

(b) Let 0:G > GL(V) bea representation of a finite group G on a vector space V. There 
exists a basis B of V such that the matrix representation R obtained from p using this 
basis is unitary. 

(c) Let R:G > GL, be a matrix representation of a finite group G. There is an invertible 
matrix P such that Ry = PBR is unitary for all g, i.e., such that R’ is a homomorphism 
from G to the unitary group U,. 


(d) Every finite subgroup of GL,, is conjugate to a subgroup of the unitary group Uy. 
Proof. (a) This follows from Theorem 10.3.6 and Corollary 10.3.5. 


(b) Given p, we choose a G-invariant positive definite Hermitian form on V, and we take 
for B an orthonormal basis with respect to this form. The associated matrix representation 
will be unitary. 


(c) This is the matrix form of (b), and it is derived in the usual way, by viewing R as a 
representation on the space C” and then changing basis. 


(d) This is obtained from (c) by viewing the inclusion of a subgroup Hf into GL, as a matrix 
representation of H. O 


This corollary provides another proof of Theorem 4.7.14: 
Corollary 10.3.9 Every matrix A of finite order in GL, (C) is diagonalizable. 


Proof. The matrix A generates a finite cyclic subgroup of GL,. By Theorem 10.3.8(d), this 
subgroup is conjugate to a subgroup of the unitary group. Hence A is conjugate to a unitary 
matrix. The Spectral Theorem 8.6.8 tells us that a unitary matrix is diagonalizable. Therefore 
A is diagonalizable. O 


10.4 CHARACTERS 


As mentioned in the first section, one works almost exclusively with characters, one reason 
being that representations are complicated. The character x of a representation p is the 
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complex-valued function whose domain is the group G, defined by 
(10.4.1) x(g) = trace py. 


If R is the matrix representation obtained from p by a choice of basis, then x is also 
the character of R. The dimension of the vector space V is called the dimension of the 
representation , and also the dimension of its character x. The character of an irreducible 
representation is called an irreducible character. 


Here are some basic properties of the character. 


Proposition 10.4.2 Let x be the character of a represeniation p of a finite group G. 

(a) x(1) is the dimension of x. 

(b) The character is constant on conjugacy classes: If g’ = hgh*!, then x(g’) = x(g). 

(c) Let g be an element of G of order k. The roots of the characteristic polynomial of Pg 


are powers of the k-th root of unity ¢ = e?”'/*. If o has dimension d, then x(g) is a sum 
of d such powers. 


(d) x(g"') is the complex conjugate x(g) of x(g). 
(e) The character of a direct sum p ® p’ of representations is the sum x + x’ of their 
characters. 


(f) Isomorphic representations have the same character. 
Proof. Parts (a) and (b) were discussed before, for matrix representations (see (10.1.6)). 


(c) The trace of pg is the sum of its eigenvalues. If A is an eigenvalue of p, then AK is an 
eigenvalue of ee and if g* = 1, then pf, = J and \* = 1. So A is a power of €. 


(d) The eigenvalues 4;,...,Ag of Rg have absolute value 1 because they are roots of 
unity. For any complex number A of absolute value 1, 7! = 2. Therefore x(97!) = 
Ree A =A, +---+Ag = x(g). 


Parts (e) and (f) are obvious. O 


Two things simplify the computation of a character x. First, since x is constant on 
conjugacy classes, we need only determine the value of x on one element in each class — a 
representative element. Second, since trace is independent of a basis, we may select a 
convenient basis for each individual group element to compute it. We don’t need to use the 
same basis for all elements. 


There is a Hermitian product on characters, defined by 


(10.4.3) are KE) x): 
& 


When x and x’ are viewed as vectors, as in Table 10.1.6, this is the standard Hermitian 
_ product (8.3.3), scaled by the factor a 
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It is convenient to rewrite this formula by grouping the terms for each conjugacy class. 
This is permissible because the characters are constant on them. We number the conjugacy 
classes arbitrarily, as C,,..., C,, and we let c; denote the order of the class C;. We also 
choose a representative element g; in the class C;. Then 


(10.4.4) ix, x) ="ay.cntlan x (gi): 


— 


We go back to our usual example: Let G be the symmetric group $3. Its class equation 
is 6 = 1+2+3, and the elements 1, x, y represent the conjugacy classes of orders 1, 2, 3, 
respectively. Then 


(a Xana (x@x') 2 eG) Gx) = 3x0) x'(0)) 
Looking at Table 10.1.6, we find 
(10.4.5) (xa, XA) =$ (44240) =1 and (Xa, Xz) = % (2+-2+0) =0. 


The characters x7, Xx, Xa are orthonormal with respect to the Hermitian product (, ). 


These computations illustrate the Main Theorem on characters. It is one of the most 
beautiful theorems of algebra, both because it is so elegant, and because it simplifies the 
problem of classifying representations so much. 


Theorem 10.4.6 Main Theorem. Let G be a finite group. 


(a) (orthogonality relations) The irreducible characters of G are orthonormal: If x; is the 
character of an irreducible representation o;, then (x;, xi) = 1. If x; and x 'j are the 
characters of nonisomorphic irreducible representations ; and pj, then (x;, xj) = 0. 

(b) There are finitely many isomorphism classes of irreducible representations, the same 
number as the number of conjugacy classes in the group. 


(c) Let p),..., e, represent the isomorphism classes of irreducible representations of G, 
and let x1,..., Xr be their characters. The dimension d; of o; (or of x;) divides the 
order |G| of the group, and |G| = d: +--+. 


This theorem is proved in Section 10.8, except we won’t prove that d; divides |G]. 


One should compare (c) with the class equation. Let the conjugacy classes be 
C1, ..99 C, andilet c; = |C;|. Then’¢; divides,|G|yandilG|—\en-aenies 


The Main Theorem allows us to decompose any character as a linear combination of 
the irreducible characters, using the formula for orthogonal projection (8.4.11). Maschke’s 
Theorem tells us that every representation ¢ is isomorphic to a direct sum of the irreducible 
representations 0, ..., 0. We write this symbolically as 


(10.4.7) pnp, ®---Onrpr, 


where n; are non-negative integers, and n;; stands for the direct sum of n; copies of pj. 
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Corollary 10.4.8 Let P1,---, Pr represent the isomorphism classes of irreducible repre- 
sentations of a finite group G, and let p be any representation of G. Let x; and x be the 
characters of p; and p, respectively, and let n; = (x, x;). Then 

(a) xX =1xX14+---+n,x,, and 

(b) pis isomorphic to n1p; ® --- Bn; py. 


(c) Two representations p and 9’ of a finite group G are isomorphic if and only if their 
characters are equal. 


Proof. Any representation p is isomorphic to an integer combination m,p, ® --- ® m;py 
of the representations ;, and then x = m x; +--- +m, x, (Lemma 10.4.2). Since the 
characters x; are orthonormal, the projection formula shows that m; = n;. This proves (a) 
and (b), and (c) follows. O 


Corollary 10.4.9 For any characters x and x’, (x, x’) is an integer. | 


Note also that, with x as in (10.4.8)(a), 
(10.4.10) ~~ x= he foe. me, 


Some consequences of this formula are: 


== 1 <> ¥ 1s an irreducible character, 

)=2 = x is the sum of two distinct irreducible characters, 
}=3 & x isthe sum of three distinct irreducible characters, 
j=4 = 


x is either the sum of four distinct irreducible characters, or 
X = 2x; for some irreducible character x;. 


A complex-valued function on the group, such as a character, that is constant on each 
conjugacy class, is called a class function. A class function @ can be given by assigning 
arbitrary values to each conjugacy class. So the complex vector space H of class functions 
has dimension equal to the number of conjugacy classes. We use the same product as (10.4.3) 
to make # into a Hermitian space: 


(9,0) =a Dd, la) WC). 
§ 


Corollary 10.4.11 The irreducible characters form an orthonormal basis of the space 1 of 
class functions. 


This follows from parts (a) and (b) of the Main Theorem. The characters are independent 
because they are orthonormal, and they span + because the dimension of 7 is equal to the 
number of conjugacy classes. C 


Using the Main Theorem, it becomes easy to see that T, &, and A represent all of the 
isomorphism classes of irreducible representations of the group S3 (see Section 10.1). Since 
there are three conjugacy classes, there are three irreducible representations. We verified 
above (10.4.5) that (x4, Xa) = 1, so A is an irreducible representation. The representations 
T and ¥ are obviously irreducible because they are one-dimensional. And, these three 
representations are not isomorphic because their characters are distinct. 
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The irreducible characters of a group can be assembled into a table, the character 
table of the group. It is customary to list the values of the character on a conjugacy class 
just once. Table 10.1.6, showing the irreducible characters of S3, gets compressed into 
three columns. In the table below, the three conjugacy classes in $3 are described by the 
representative elements 1, x, y, and for reference, the orders of the conjugacy classes are 
given above them in parentheses. We have assigned indices to the irreducible characters: 


XT = X1, X= = X2, and Xa = x3. 
conjugacy 
class 
(1) (2) ©) © order of the class 
1 x y __ representative element 


irreducible X1 1 1 i| 
character x2 a 1 Sh 
2 


value of the 
character 


(10.4.12) Character table of the symmetric group $3 


In such a table, we put the trivial character, the character of the trivial representation, 
into the top row. It consists entirely of 1’s. The first column lists the dimensions of the 
representations (10.4.2)(a). 


We determine the character table of the tetrahedral group T of 12 rotational symmetries 
of a tetrahedron next. Let x denote rotation by 27r/3 about a face, and let z denote rotation 
by z about the center of an edge, as in Figure 7.10.8. The conjugacy classes are C(1), 
C(x), C(x’), and C(z), and their orders are 1, 4, 4, and 3, respectively. So there are four 
irreducible characters; let their dimensions be d;. Then 12 = d? 4+.---4 ae The only solution 
of this equation is 12 = 17 +1* + 1* +32, so the dimensions of the irreducible representations 
are 1,1, 1,3. We write the table first with undetermined entries: 


dM @ @ @® 


me 


X1 oma Ua lem | 
x2 a: es ns, 
x3 lend bee 
X4 3 * * * 


and we evaluate the form (10.4.4) on the orthogonal characters x; and xp. 


Since x2 is a one-dimensional character, x2(z) = c is the trace of a 1X1 matrix. It is the 
unique entry in that matrix, and since z” = 1, its square is 1. Soc is equal to 1 or-1. Similarly, 
since x° = 1, x2(x) = awill be a power of w = e””'/3. So a is equal to 1, w, or w2. Moreover, 
b = a’. Looking at (10.4.13), one sees that a = 1 is impossible. The possible values are 
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a = @ or @*, and then c = 1. The same reasoning applies to the character x3. Since x2 
and x3 are distinct, and since we can interchange them, we may assume that a = w and 
a’ = a. It is natural to guess that the irreducible three-dimensional character x4 might be 
the character of the standard representation of T by rotations, and it is easy to verify this by 
computing that character and checking that (x, x) = 1. Since we know the other characters, 
x4 is also determined by the fact that the characters are orthonormal. The character table is 


Dh Aa) @ 


(10.4.14) Character table of the tetrahedral group 


The columns in these tables are orthogonal. This is a general phenomenon, whose 
proof we leave as Exercise 4.6. 


10.5 ONE-DIMENSIONAL CHARACTERS 


A one-dimensional character is the character of a representation of G on a one-dimensional 
vector space. If o is a one-dimensional representation, then pg is represented by a 1X1 
matrix Rg, and x(g) is the unique entry in that matrix. Speaking loosely, 


(10.5.1) Mig) — Oe = Ra. 


A one-dimensional character x is a homomorphism from G to GL; = C%, because 


x(gh) = Pen = PgPn = X(8)x(h). 


If x is one-dimensional and if g is an element of G of order k, then x(g) is a power of the 
primitive root of unity ¢ = e?*'/k, And since C™ is abelian, any commutator is in 
the kernel of such a character. 

Normal subgroups are among the many things that can be determined by looking at a 
character table. The kernel of a one-dimensional character x is the union of the conjugacy 
classes C(g) such that x(g) = 1. For instance, the kernel of the character x2 in the character 
table of the tetrahedral group T is the union of the two conjugacy classes C(1) U C(2). It is 
a normal subgroup of order four that we have seen before. 


Warning: A character of dimension greater than 1 is not a homomorphism. The values taken 
on by such a character are sums of roots of unity. 


Theorem 10.5.2 Let G be a finite abelian group. 


(a) Every irreducible character of G is one-dimensional. The number of irreducible charac- 
ters is equal to the order of the group. 

_(b) Every matrix representation R of G is diagonalizable: There is an invertible matrix P 

such that P-'!RgP is diagonal for all g. 
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Proof. In an abelian group of order N, there will be N conjugacy classes, each contain- 
ing a single element. Then according to the main theorem, the number of irreducible 
representations is also equal N. The formula N = d? +...4+ d% shows that d; = 1 
for all 7. O 


A simple example: The cyclic group C3 = {1, x, x*} of order 3 has three irreducible 
characters of dimension 1. If x is a one of them, then x(x) will be a power of wm = e2ti/3 and 
x(x2) = x(x)’. Since there are three distinct powers of w and three irreducible characters, 
xi (x) must take on all three values. The character table of C3 is therefore 


Giyert). -() 
x 
X1 il 1 1 
X2 il wo ow 
x3 1 ww @ 
(10.5.3) Character table of the cyclic group C3 


10.6 THE REGULAR REPRESENTATION 


Let S = (s),..., Sn) bea finite ordered set on which a group G operates, and let Rz denote 
the permutation matrix that describes the operation of a group element g on S. If g operates 
on S as the permutation p, i.e., if g5; = Sp;, that matrix is (see (1.5.7)) 


(10.6.1) Re, ere 
1 


and Rgé; = €p;. The map g~ Rg defines a matrix representation R of G that we call a 
permutation representation, though that phrase had a different meaning in Section 6.11. The 
representation (10.2.11) of S3 is an example of a permutation representation. 

The ordering of S is used only so that we can assemble Ry into a matrix. It is nicer 
to describe a permutation representation without reference to an ordering. To do this we 
introduce a vector space Vs that has the unordered basis {es} indexed by elements of S. 
Elements of Vs are linear combinations 5°. c;és, with complex coefficients c;. If we are 
given an operation of G on the set S, the associated permutation representation p of G on 
Vs is defined by 


(10.6.2) Pg (és) = €gs- 


When we choose an ordering of S, the basis {e;} becomes an ordered basis, and the matrix 
of Pg has the form described above. 
The character of a permutation representation is especially easy to compute: 


Lemma 10.6.3 Let p be the permutation representation associated to an operation of a 


group G on a nonempty finite set S. For all g in G, x(g) is equal to the number of elements 
of S that are fixed by g. 


Proof. We order the set S arbitrarily. Then every element s that is fixed by g, there is a1 on 
the diagonal of the matrix Ry, (10.6.1), and for every element that is not fixed, there is a 0.0 
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When we decompose a set on which G operates into orbits, we will obtain a decom- 
position of the permutation representation p or R as a direct sum. This is easy to see. But 
there is an important new feature: The fact that linear combinations are available allows us 
to decompose the representation further. Even when the operation of G on S is transitive, 
p will not be irreducible unless S is a set of one element. 


Lemma 10.6.4 Let R be the permutation representation associated to an operation of G on 
a finite nonempty ordered set S. When its character x is written as an integer combination 
of the irreducible characters, the trivial character x, appears. : 


Proof. The vector >>, és of Vs, which corresponds to (1,1,..., 1)' in C”, is fixed by every 
permutation of S, so it spans a G-invariant subspace of dimension 1 on which the group 
operates trivially. O 


Example 10.6.5 Let G be the tetrahedral group T, and let S be the set (v,...., v4) of vertices 
of the tetrahedron. The operation of G on S defines a four-dimensional representation of 
G. Let x denote the rotation by 27/3 about a face and z the rotation by about an edge, as 
before (see 7.10.8). Then x acts as the 3-cycle (234) and z acts as (13)(24). The associated 
permutation representation is 


100 0 001 0 
00 0 1 OMe 0 “Tt 
(10.6.6) Ry = 7 foc. ol* i 1000 
001 0 HOw 1. Ore 
Its character is 
1x xz 
(10.6.7) er "ie aE Ye 


The character table (10.4.14) shows that x’*’ = x1 + x4. By the way, another way to 
determine the character x4 in the character table is to check that (x"°", x”°"") = 2. Then 
xvert is a sum of two irreducible characters. Lemma 10.6.4 shows that one of them is the 
trivial character x,;. So x¥@”* — x; is an irreducible character. It must be x4. O 


e The regular representation p"& of a group G is the representation associated to the 
operation of G on itself by left multiplication. It is a representation on the vector space Vg 
that has a basis {eg} indexed by elements of G. If h is an element of G, then 


(10.6.8) Boal) — Cen: 


This operation of G on itself by left multiplication isn’t particularly interesting, but the 
associated permutation representation o’°S is very interesting. Its character x’©8 is simple: 


(10.6.9) x78 (1) =|G|, and x’8(g)=0, ifg#l. 


This is true because the dimension of © is the order of the group, and because multiplication 
by g doesn’t fix any element of G unless g = 1. 
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This simple formula makes it easy to compute (x7°8, x) for any character x: 
(10.6.10) (x"8, x) = yD X88) x(8) = ex"? DCL) = x1) = dim x. 
g 


Corollary 10.6.11 Let x;,..., x, be the irreducible characters of a finite group G, let p; be 
a representation with character x;, and let dj; = dim x;. Then x’°8 = d}x,; +---+4rxr, 
and p’°8 is isomorphic to dip; ®--- ® dy pr. 


This follows from (10.6.10) and the projection formula. Isn’t it nice? Counting 
dimensions, 


ie r 
(10.6.12) IGf=dimy”""= > ddim — a 
i=1 i=1 
This is the formula in (c) of the Main Theorem. So that formula follows from the orthogonality 
relations (10.4.6)(a). 
For instance, the character of the regular representation of the symmetric group 53 is 


x ey 
> 670 0" 


Looking at the character table (10.4.12) for $3, one sees that x’°§ = x1 + x2 + 2x3, as 
expected. 

Still one more way to determine the last character x4 of the tetrahedral group (see 
(10.4.14) is to use the relation x8 = x; + x2 + x3 +3xX4. 


We determine the character table of the icosahedral group / next. As we know, J is 
isomorphic to the alternating group As (7.4.4). The conjugacy classes have been determined 
before (7.4.1). They are listed below, with representative elements taken from As: 


class representative 
C; = {1} (1) 

(106.13) C2 = 15 edge rotations, angle 7 (12)(3 4) 
C3 = 20 vertex rotations, angles +277 /3 (123) 

C4 = 12 face rotations, angles + 27/5 (12345) 

Cs = 12 face rotations, angles + 47/5 (13524) 


Since there are five conjugacy classes, there are five irreducible characters. The 
character table is 
@) (5) @0) (2) (2) 
2/3 2/5 4/5 angle 


(10.6.14) Character table of the icosahedral group J 
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The entries @ and 6 are explained below. One way to find the irreducible characters is 
to decompose some permutation representations. The alternating group As operates on the 
Set of five indices. This gives us a five-dimensional permutation representation; we’ll call it 
p’. Its character x’ is 


0 mw 2/3 2/5 4x/5 angle 
Lee 0 0 


Phen (x, y= 2 (1 PS al 1o P20 zi) = 2. Therefore x’ is the sum of two distinct 
irreducible characters. Since the trivial representation is asummand, x’ — x; is an irreducible 
character, the one labeled x, in the table. 

Next, the icosahedral group / operates on the set of six pairs of opposite faces of the 
dodecahedron; let the corresponding six-dimensional character be x”. A similar computation 
shows that x” — x, is the irreducible character xs. 

We also have the representation of dimension 3 of / as a rotation group. Its character 
is X2. To compute that character, we remember that the trace of a rotation of R° with angle 
0 is 1 + 2cos 8, which is also equal to 1 + e!® + e~ #8 (5.1.28). The second and third entries for 
X2 are 1+ 2cosmz =-1 and 1 +2cos27/3 = 0. The last two entries are labeled 


a@ = 1+ 2cos(27/5) = eee and B=1+2cos(47/5) =1 ae +O, 


where ¢ = e?”'/5, The remaining character x3 can be determined by orthogonality, or by 
using the relation 
M+ ha + 373-1 4x4 + 5X5. 


10.7. SCHUR’S LEMMA 


Let p and p’ be representations of a group G on vector spaces V and V’. A linear 
transformation 7: V’ — V is called G-invariant if it is compatible with the operation of G, 
meaning that for all gin G, 


(10.7.1) T(gv’)=gT(v'), or Top, =pgoT, 


as indicated by the diagram 


(10.7.2) = 
| |» 
WW SS V 


A bijective G-invariant linear transformation is an isomorphism of representations (10.1.16). 
It is useful to rewrite the condition for G-invariance in the form 


=i =il i 
Tv’) =g 'TM(gv’), or p, Tp, =T. 
This definition of a G-invariant linear transformation T makes sense only when the 


representations p and p’ are given. It is important to keep this in mind when the ambiguous 
group operation notation T(gv’) = g7(v’) is used. 
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If bases B and B’ for V and V’ are given, and if Rg, ae and M denote the matrices of 
Pg, Pg, and T with respect to these bases, the condition (10.7.1) becomes 


-1 
(10.7.3) MR, =RgM or Rg MR, in 
for all gin G. A matrix M is called G-invariant if it satisfies this condition. 


Lemma 10.7.4 The kernel and the image of a G-invariant linear transformation T: V’ > V 
are G-invariant subspaces of V’ and V, respectively. 


Proof. The kernel and image of any linear transformation are subspaces. To show that the 
kernel is G-invariant, we must show that if x is in ker T, then gx is in ker 7, ie., that if 
T(x) = 0, then T(gx) = 0. This is true: T(gx) = g7(x) = g0 = 0. If y is in the image of T, 
i.e., y= 7(x) for some x in V’, then gy = g7(x) = T(gx), so gy is in the image too. 0 


Similarly, if o is a representation of G on V, a linear operator on V is G-invariant if 
(OWS) T(gv) = gT(v), or PgoT =Topg, forall ginG, 


which means that T commutes with each of the operators pg. The matrix form of this 
condition is 
RgM = MRg or M= R,'MRg, for all gin G. 


Because a G-invariant linear operator T must commute with all of the operators (g, 
invariance is a strong condition. Schur’s Lemma shows this. 


Theorem 10.7.6 Schur’s Lemma. 


(a) Let pand 9’ be irreducible representations of G on vector spaces V and V’, respectively, 
and let 7: V’ — V be a G-invariant transformation. Either 7 is an isomorphism, or else 
Tr =0. 

(b) Let po be an irreducible representation of G on a vector space V, and let T: V > V be 
a G-invariant linear operator. Then T is multiplication by a scalar: T = cl. 


Proof. (a) Suppose that T is not the zero map. Since p’ is irreducible and since ker T is 
a G-invariant subspace, ker J is either V’ or {0}. It is not V’ because 740. Therefore 
ker T = {0}, and T is injective. Since p is irreducible and im T is G-invariant, im T is either 
{0} or V. It is not {0} because T #0. Therefore im T = V and T is surjective. 


(b) Suppose that T is a G-invariant linear operator on V. We choose an eigenvalue A 
of T. The linear operator S = T — AJ is also G-invariant. The kernel of S§ isn’t zero 


because it contains an eigenvector of 7. Therefore S is not an isomorphism. i (a), 
S=Omnd’T =u, 


Suppose that we are given representations o and p’ on spaces V and V’. Though 
G-invariant linear tranformations are rare, the averaging process can be used to create a 


Section 10.8 Proof of the Orthogonality Relations 309 


G-invariant transformation from any linear transformation T: V’ > V. The average is the 
linear transformation T defined by 


r —_—— | =] = = 
(10.7.7) Tv) =7 > et (ev)),.. or T= Gi), Pe Te, 
geG gEeG 


Similarly, if we are given matrix representations R and R’, of G of dimensions n and m, and 
if M is any m Xn matrix, then the averaged matrix is 


pea =il / 
(10.7.8) M= 7a»), Ry MR,. 


Lemma 10.7.9 With the above notation, T is a G-invariant linear transformation, and Misa 
G-invariant matrix. If T is G-invariant, then T = T, and if M is G-invariant, then M = M. 


Proof. Since compositions and sums of linear transformations are linear, T is a linear 
transformation, and it is easy to see that T = T if T is invariant. To show that T is invariant, 
we let h be an element of G and we show that JT = h-!Th. We make the substitution 


21 = gh. Reindexing as in (10.2.3), 


h'Th=h "(> 8 'Ts)h = (gh) Teh) 
& & 
— -1 Ge 
= 1) 81 Te = oy 8 Tg =T. 
& & 


The proof that M is invariant is analogous. ; O 


The averaging process may yield T = 0, the trivial transformation, though T was 
not zero. Schur’s Lemma tells us that this must happen if p and p’ are irreducible and not 
isomorphic. This fact is the basis of the proof given in the next section that distinct irreducible 
characters are orthogonal. For linear operators, the average is often not zero, because trace 
is preserved by the averaging process. 


Proposition 10.7.10 Let o be an irreducible representation of G on a vector space V. 
Let T: V > V be a linear operator, and let T be as in (10.7.7), with e’ = po. Then 
traceT = trace T. If trace 740, then T 40. O 


10.8 PROOF OF THE ORTHOGONALITY RELATIONS 
We will now prove (a) of the Main Theorem. We use matrix notation. Let M denote the 


space C””” of m Xn matrices. 


Lemma 10.8.1 Let.A and B be m Xm and n Xn matrices respectively, and let F be the linear 
operator on M defined by F(M) = AMB. The trace of F is the product (trace A) (trace B). 
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Proof. The trace of an operator is the sum of its eigenvalues. Let a, ... @m» and Py rere mp), 
be the eigenvalues of A and B’ respectively. If X; is an eigenvector of A with eigenvalue aj, 
and Y ; is an eigenvector of B’ with eigenvalue B ;, the mXn matrix M = X; Yi is an eigenvector 
for the operator F, with eigenvalue a; 8 ;. Since the dimension of M is mn, the mn complex 
numbers @;f; are all of the eigenvalues, provided that they are distinct. If so, then 


trace = a8; = (>. e)() 969) = (trace A)(trace B). 
i,j i j 


In general, there will be matrices A’ and B’ arbitrarily close to A and B such that the products 
of their eigenvalues are distinct, and the lemma follows by continuity (see Section 5.2). O 


Let o’ and p be representations of dimensions m and n, with characters x’ and x 
respectively, and let R’ and R be the matrix representations obtained from p’ and ¢ using 
some arbitrary bases. We define a linear operator ® on the space M by 


(10.8.2) ®(M) = > Ry MR, =M. 
&§ 


In the last section, we saw that M is a G-invariant matrix, and that M = M if M is invariant. 
Therefore the image of ® is the space of G-invariant matrices. We denote that space by M. 

Parts (a) and (b) of the next lemma compute the trace of the operator ® in two ways. 
The orthogonality relations are part (c). 


Lemma 10.8.3. With the above notation, 
(a) trace ® = (x, x’). 
(b) trace @ = dim M. 


(c) If o is an irreducible representation, (x, x) = 1, and if p and p’ are non-isomorphic 
irreducible representations, (x, x’) = 0. 


Proof. (a) We recall that x(g7!) = x(g) (10.4.2)(d). Let F; g denote the linear operator on 
M defined by F,(M) = R,'MRj. 

Since trace is linear, Lemma 10.8.1 shows that 
trace ® = i Dos race = = dig (trace R,,') (trace R,) 


(10.8.4) a time 
= dag XE XG) = dre ea 


(b) Let NV be the kernel of ®. If M is in the intersection MN NV , then ®(M) = M and also 
®(M) = 0, so M = 0. The intersection is the zero space. Therefore M is the direct sum 
M ® N (4.3.1)(b). We choose a basis for M by appending bases of Mand N. Since M = M 
if M is invariant, ® is the identity on M. So the matrix of ® will have the block form 


of 


where / is the identity matrix of size dim M. Its trace is equal to the dimension of M. 
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(c) We apply (a) and (b): (x, x’) = dim M. If fp’ and p are irreducible and not isomorphic, 
Schur’s Lemma tells us that the only G-invariant operator is zero, and so the only G- 
invariant matrix is the zero matrix. Therefore M = (Olwanduige)a=.0. yee peSehur’s 
Lemma says that the G-invariant matrices have the form cI. Then M has dimension 1, 
and (x, x’) = 1. O 


We go over to operator notation for the proof of Theorem 10.4.6(b), that the number 
of irreducible characters is equal to the number of conjugacy classes in the group. As before, 
H denotes the space of class functions. Its dimension is equal to the number of conjugacy 
classes (see (10.4.11)). Let C denote the subspace of H spanned by the characters. We 
show that C = H by showing that the orthogonal space to C in H is zero. The next lemma 
does this. 


Lemma 10.8.5 


(a) Let g be a class function on G that is orthogonal to every character. For any represen- 
tation p of G, Tel dig P(E) Pg is the zero operator. 


(b) Let p’° be the regular representation of G. The operators ig” with g in G are linearly 


independent. 
(c) The only class function ¢ that is orthogonal to every character is the zero function. 


Proof. (a) Since any representation is a direct sum of irreducible representations, we may 
assume that ¢ is irreducible. Let T = a 2 P(8) Pg. We first show that T is a G-invariant 


operator, i.e., that T = es. for every h in G. Let g” = h”|gh. Then as g runs over the 


group G, so does g”. Since ¢ is a homomorphism, ome PgPh = Pg”, and because ¢ is a class 
function, g(g) = y(g”). Therefore 


Pr Ten = oj DPR) Pe" = ei DPE Pe” = ToT D_ PB) Pe = T. 
& & & 


Let x be the character of p. The trace of T is Tal a y(g)x(g) = (g, x). The trace is 
zero because ¢ is orthogonal to x. Since p is irreducible, Schur’s lemma tells us that T is 
multiplication by a scalar, and since its trace is zero, T = 0. 


(b) We apply Formula 10.6.8 to the basis element e; of Ve: a (€,) = ég. Then since the 
vectors €, are independent elements of Vc, the operators eo are independent too. 


(c) Let y be a class function orthogonal to every character. (a) tells us that }', (g) a 0 
is a linear relation among the operators py °, which are independent by (b). ‘Therefore all of 
the coefficients p(g) are zero, and ¢ is the zero function. 0 


10.9 REPRESENTATIONS OF SU2 


Remarkably, the orthogonality relations carry over to compact groups, matrix groups that 
are compact subsets of spaces of matrices, when summation over the group is replaced by 
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an integral. In this section, we verify this for some representations of the special unitary 
group SU}. 


We begin by defining the representations that we will analyze. Let H, denote the 
complex vector space of homogeneous polynomials of degree n in the variables u, v, of 
the form 


(10.9.1) Ve i—tene a Gye FFiaal 2 c YT e 
We define a representation 
(10972). Pn: SU2 > GL(Az) 


as follows: The result of operating by an element P of SU2 on a polynomial f in H,, will be 
another polynomial that we denote by [P f]. The definition is 


(10.9.3) [Pf\(u, v) = f(ua—vb,ub+ va), where P= E “ ; 


In words, P operates by substituting (u, v)P for the variables (uw, v). Thus 
[Pu'v!] = (ua — vb)! (ub + va)/. 


It is easy to compute the matrix of this operator when P is diagonal. Let w = e’, and let 


(10.9.4) Ag = ik | = i =| = | eu 


Then [ Agu‘ v/] = (ua)! (va@)/ = u'v/a'—/.So Ag acts on the basis (u”, u”-!v, ..., uy"! yt) 
of the space H,, as the diagonal matrix 


The character x, of the representation p, is defined as before: x,(g) = trace Pn,g: It 
is constant on the conjugacy classes, which are the latitudes on the sphere SU. Because of 
this, it is enough to compute the characters x, on one matrix in each latitude, and we use 
Ag. To simplify notation, we write x,(9) for xn (Ag). The character is 


x0) = 1 

x10) =a+a'! 

x2(0) =a? 414+ 07 

. gttl — g(t) 


(10.9.5) Xn (0) =q" + a2 fee ey at = 
a—arl 
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The Hermitian product that replaces (10.4.3) is 


(10.9.6) (Xm> Xn) = 1) f Xn B) Xn) dV. 


In this formula G stands for the group SU, the unit 3-sphere, |G| is the three-dimensional 
volume of the unit sphere, and dV stands for the integral with respect to three-dimensional 
volume. The characters happen to be real-valued functions, so the complex conjugation that 
appears in the formula is irrelevant. 


Theorem 10.9.7 The characters of SU that are defined above are orthonormal: (xm, Xn) = 0 
ihren (Xn il. 


Proof. Since the characters are constant on the latitudes, we can evaluate the integral (10.9.6) 
by slicing, as we learn to do in calculus. We use the unit circle x9 = cos@, x; = sin, and 
X2 = +--+ =X, = Oto parametrize the slices of the unit n-sphere S” : {xe +x} Fa +x? =i}. 
So @ = 0 is the north pole, and 6 = zr is the south pole (see Section 9.2). For 0 < 6 < 7, the 
slice of the unit -sphere is an (7 —1)-sphere of radius sin 0. 


To compute an integral by slicing, we integrate with respect to arc length on the unit 
circle. Let vol, (r) denote the n-dimensional volume of the n-sphere of radius r. So vol, (r) 
is the arc length of the circle of radius r, and volz(r) is the surface area of the 2-sphere of 
radius r. If f is a function on the unit n-sphere S” that is constant on the slices 6 = c, its 
integral will be 


(10.9.8) [ fdVy = [ iOke TO: 


where dV,, denotes integration with respect to n-dimensional volume, and f(@) denotes the 
value of f on the slice. 


Integration by slicing provides a recursive formula for the volumes of the spheres: 


us 
(10.9.9) vol, (1) = iL ie / vol, _1 (sin 6) dO, 
Ss” 0 


2 


and vol,(r) = r”vol,(1). The zero-sphere xe = r° consists of two points. Its zero- 


dimensional volume is 2. So 
/ ft A 
vol; (7) = rf volo(sin@)d?d = al 206 = 27rr., 
0 
- The: Tt 
(10.9.10) voh(r) =r / vol; (sin @)d@ = r° / 2n sin6 do = 4nr’, 
0 0 


Jt It 
vol3(r) =r / vol>(sin @)dé = r° i) Am sin? 6 dO = 27°. 
0 0 


To evaluate the last integral, it is convenient to use the formula sin @ = -1(@ — an!) /2: 


(10.9.1) vol; (sin 0) = 47 sin? 9 = -11(a — a ')?. 
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Expanding, volo (sin @) = (2 — (a+ a~')). The integral of a? + af is zero: 


m a an 0 ifk>0 
(10.9.12) i (ok + a *)d0 = i of dO = | Qn ifk=0. 


We now compute the integral (10.9.6). The volume of the group SU} is 
(10.9.13) vol3(1) = 27”. 
The latitude sphere that contains Ag has radius sin 0. Since the characters are real, integration 
by slicing gives ; 
(10.9.14) 


(Xms Xn) = 55 [ Xm (8) Xn (8) volz(sin 6) dO 


- 1 4 qmtl _ oe (m+) qt ~~ oe (n+) (- ( 4 -1)2) - 
~ 272 Jo a—al ae! me 

1 jt 2 =(( 2) 1 is m—n n-m 
=+5/ Cala" gamer )ao+ = | Cra" a0 


This evaluates to 1 if m =n and to zero otherwise (see (10.9.12)). The characters x, are 
orthonormal. O 


We won’t prove the next theorem, though the proof follows the case of finite groups 
fairly closely. If you are interested, see [Sepanskil]. 


Theorem 10.9.15 Every continuous representation of SU? is isomorphic to a direct sum of 
the representations py, (10.9.2). 


We leave the obvious generalizations to the reader. 


—lsrael Herstein 


EXERCISES 


Section 1 Definitions 
1.1. Show that the image of a representation of dimension 1 of a finite group is a cyclic group. 
1.2. (a) Choose a suitable basis for R? and write the standard representation of the octahedral 
group O explicitly. (b) Do the same for the dihedral group Dy. 


Section 2 Irreducible Representations 


2.1. Prove that the standard three-dimensional representation of the tetrahedral group T is 
irreducible as a complex representation. 


2.2. 


ieee 
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Consider the standard two-dimensional representation of the dihedral group D,. For 
which n is this an irreducible complex representation? 


Suppose given a representation of the symmetric group $3 on a vector space V. Let x 
and y denote the usual generators for 53. 


(a) Let uw be a nonzero vector in V. Let v = u+xu+x2u and w = u + yu. By 
analyzing the G-orbits of v, w, show that V contains a nonzero invariant subspace 
of dimension at most 2. 

(b) Prove that all irreducible two-dimensional representations of G are isomorphic, and 
determine all irreducible representations of G. 


Section 3 Unitary Representations 


3.1. 


3.2. 


3.3. 


3.4. 


SESH 


. : has order 3, so it defines 


a matrix representation of G. Use the averaging process to produce a G-invariant form 
from the standard Hermitian product X*Y on C?. 


Let G be a cyclic group of order 3. The matrix A = 


Let p:G — GL(V) bea representation of a finite group on a real vector space V. Prove 
the following: 


(a) There exists a G-invariant, positive definite symmetric form (, ) on V. 
(b) pis a direct sum of irreducible representations. 
(c) Every finite subgroup of GL, (R) is conjugate to a subgroup of Oy. 


(a) Let R:G — SL2(R) be a faithful representation of a finite group by real 2x2 
matrices with determinant 1. Use the results of Exercise 3.2 to prove that G is a 
cyclic group. 

(b) Determine the finite groups that have faithful real two-dimensional representations. 

(c) Determine the finite groups that have faithful real three-dimensional representations 


with determinant 1. 
Let (, ) be a nondegenerate skew-symmetric form on a vector space V, and let e be 


a representation of a finite group G on V. Prove that the averaging process (10.3.7) 
produces a G-invariant skew-symmetric form on V, and show by example that the form 
obtained in this way needn’t be nondegenerate. 


1 
Let x be a generator of a cyclic group G of order p. Sending x ~~ F | defines a 


matrix representation G > GL (F,)). Prove that this representation is not the direct 
sum of irreducible representations. 


Section 4 Characters 


4.1. 


4.2. 


4.3. 


Find the dimensions of the irreducible representations of the octahedral group, the 
quaternion group, and the dihedral groups D4, Ds, and De. 

A nonabelian group G has order 55. Determine its class equation and the dimensions of 
its irreducible characters. 

Determine the character tables for 


(a) the Klein four group, 
(b) the quaternion group, 
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(c) the dihedral group D4, 
(d) the dihedral group Dg, 
(e) a nonabelian group of order 21 (see Proposition 7.7.7). 


4.4, Let G be the dihedral group Ds, presented with generators x, v and relations w= 1, 
y? = 1, yxy7! = x7! and let x be an arbitrary two-dimensional character of G. 


(a) What does the relation x° = 1 tell us about x(x)? 
(b) What does the fact that x and .x~! are conjugate tell us about x(x)? 
(c) Determine the character table of G. 


(d) Decompose the restriction of each irreducible character of Ds into irreducible 
characters of Cs. 


4.5. Let G = (x, |x°, 4, ywy7! x7"). Determine the character table of G. 


4.6. Explain how to adjust the entries of a character table to produce a unitary matrix, and 
prove that the columns of a character table are orthogonal. 


4.7. Let 7:G > G' = G/N be the canonical map from a finite group to a quotient group, 
and let p’ be an irreducible representation of G’. Prove that the representation p = 9’ ot 
of G is irreducible in two ways: directly, and using Theorem 10.4.6. 


4.8. Find the missing rows in the character table below: 


Gd) G) © ©) @) 


Xi ‘ieee! oteaee Vagal ous 
en 1 1 -1 -1 1 
x3 3" at 1 -! 0 
Xe 3 at -1 1 0 


*4.9. Below is a partial character table. One conjugacy class is missing. 


(Die) Qe Zinn (S) 


Los SU we 


x1 i| 1 ii 1 1 
x2 if il 1 1 -l 
X3 1 -1 1 -1 i 
al lL -l 1 -l -1 
Xs 2 2 =I" =1 0 


(a) Complete the table. 

(b) Determine the orders of representative elements in each conjugacy class. 
(c) Determine the normal subgroups. 

(d) Describe the group. 


4.10. (a) Find the missing rows in the character table below. 
(b) Determine the orders of the elements a, b. c. d. 


(¢) Show that the group G with this character table has a subgroup H of order 10, and 
describe this subgroup as a union of conjugacy classes. 


*4.11. 


4.12. 
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(d) Decide whether H is Cyo or Ds. 
(e) Determine all normal subgroups of G. 


Ws 4G) Cae 
1 a b «=c @a 


xi 1 il 1 ] 1 
X2 : 1 -1 -l al 
X3 1 1 -i i -1 
x4 1 1 i -i -1 


In the character table below, w = e27/3, 


Cj (6)) Ge C7 RG) 7) 
Lea > ceed x<ciaar 

X1 il 1 1 1 1, 1 1 
X2 1 1 1 @o@ @® w @ 
x3 1 i! 1 © ®8 BOB w 
X4 i 1 -1 -w -® @® @© 
PG 1 1 -1 -0© -w© @© @® 

X6 1 iat =) eer 1 1 
x7 6 -1 Oo 050 Oo 


(a) Show that G has a normal subgroup N isomorphic to D7. 

(b) Decompose the restrictions of each character to N into irreducible N-characters. 
(c) Determine the numbers of Sylow p-subgroups, for p = 2, 3, and 7. 

(d) Determine the orders of the representative elements c, d, e, f. 

(e) Determine all normal subgroups of G. 


Let H be a subgroup of index 2 of a group G, and let o:H > GL(V) be a represen- 
tation. Let a be an element of G not in H. Define a conjugate representation o’: H + 
GL(V) by the rule o’(h) = o(a*ha). Prove that 


(a) o” is arepresentation of H. 

(b) Ifo is the restriction to H of a representation of G, then 0’ is isc-norphic to o. 

(c) If bis another element of G not in H, then the representation 0” {) = o(b"'hb) is 
isomorphic to 0”. 


Section5 One-Dimensional Characters 


el 


2° 


5.3. 


Decompose the standard two-dimensional representation of the cyclic group Cy, by 
rotations into irreducible (complex) representations. 

Prove that the sign representation p ~~ sign p and the trivial representation are the only 
one-dimensional representations of the symmetric group S». 


Suppose that a group G has exactly two irreducible characters of dimension 1, and let x 
denote the nontrivial one-dimensional character. Prove that tor all gin G. y(g) = +1. 
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5:4, 


55: 


5.6. 


Sus 


Let x be the character of a representation ¢ of dimension d. Prove that |x(g)| < d for 
all g in G, and that if |x(g)| = d, then e(g) = ¢/, for some root of unity ¢. Moreover, if 
x(g) = d, then pg is the identity operator. 


Prove that the one-dimensional characters of a group G forma group under multiplication 
of functions. This group is called the character group of G, and is often denoted by G. 
Prove that if G is abelian, then |G| = |G| and G~G. 


Let G be a cyclic group of order n, generated by an element x, and let ¢ = Cn 
(a) Prove that the irreducible representations are 09, ..., On—1, Where px: G~ C*wis 
defined by px (x) = 


(b) Identify the character group of G (see Exercise 5.5). 


(a) Let g:G — G’ be a homomorphism of abelian groups. Define an induced homo- 
morphism @: G' > G between their character groups (see Exercise 5.5). 


(b) Prove that if g is injective, then @ is surjective, and conversely. 


Section6 The Regular Representation 


6.1. 


6.2. 


6.3. 


6.4. 


6.5. 


6.6. 


6.7. 


Let R’®S denote the regular matrix representation of a group G. Determine }>, R 2 ike : 


Let e be the permutation representation associated to the operation of D3 on a by 
conjugation. Decompose the character of ¢ into irreducible characters. 


Let x© denote the character of the representation of the tetrahedral group 7 on the six 
edges of the tetrahedron. Decompose this character into irreducible characters. 


(a) Identify the five conjugacy classes in the octahedral group O, and find the orders of 
its irreducible representations. 


(b) The group O operates on these sets: 
e six faces of the cube, 


e three pairs of opposite faces, 

e eight vertices, 

e four pairs of opposite vertices, 

* six pairs of opposite edges, 

e two inscribed tetrahedra. 

Decompose the corresponding characters into irreducible characters. 
(c) Compute the character table for O. 


The symmetric group S, operates on C” by permuting the coordinates. Decompose this 
representation explicitly into irreducible representations. 

Hint: 1 recommend against using the orthogonality relations. This problem is closely 
related to Exercise M.1 from Chapter 4. 


Decompose the characters of the representations of the icosahedral group on the sets of 
faces, edges, and vertices into irreducible characters. 


The group Ss operates by conjugation on its normal subgroup As. How does this action 
operate on the isomorphism classes of irreducible representations of A5? 


6.8. 


6.9. 


6.10. 


6.11. 
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The stabilizer in the icosahedral group of one of the cubes inscribed in a dodecahedron 


in tetrahedral group 7. Decompose the restrictions to T of the irreducible characters 
Ole 


(a) ae how one can prove that a group is simple by looking at its character 
table. 


(b) Use the character table of the icosahedral group to prove that it is a simple group. 


Determine the character tables for the nonabelian groups of order 12 
(see (7.8.1)). 


The character table for the group G = PSL>(F7) is below, with y = 5(-1 + J/7i), 
¥=4-1-Vii). 


(1) @1) (24) (24) (42) (6) 


1 a b c d e 
x1 i 1 i 1 1 il 
PO. ol y y’ 1 0 
x3 Sel Fy 1 0 
X4 6 Zz, -1 -1 0 0 
xs 7 =i 0 0 -1 1 
X6 8 0 1 1 0 ~1 


(a) Use it to give two proofs that this group is simple. 
(b) Identify, so far as possible, columns that corresponds to the conjugacy classes of the 


elements 
Hdl 2 
1 ° 4 > | 


and find matrices that represent the remaining conjugacy classes. 
(c) G operates on the set of eight one-dimensional subspaces of FS. Decompose the 
associated character into irreducible characters. 


Section 7 Schur’s Lemma 


wal, 


7.26 


Te. 


Prove a converse to Schur’s Lemma: If ~ is a representation, and if the only G-invariant 
linear operators on V are multiplications by scalars, then is irreducible. 
Let A be the standard representation (10.1.3) of the symmetric group $3, and let 


B= ee . Use the averaging process to produce a G-invariant linear operator from 


left multiplication by B. 


Lele -1 -1 
The matrices R, = a cs se 1 | define a representation R of the 
1 -1| — -1 


group 53. Let g be the linear transformation C! —> C3 whose matrix is (1, 0, 0)'. Use the 
averaging method to produce a G-invariant linear transformation from ¢, using the sign 
representation & of (10.1.4) on C! and the representation R on (Oe 
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7.4. Let o be a representation of G and let C be a conjugacy class in G. Show that the linear 
operator T = )o occ Pg is G-invariant. 

7.5. Let o be a representation of a group G on Vand let x be acharacter of G. not necessarily 
the character of p. Prove that the linear operator T = N°, X(g) Pe on Vis G-invanant. 

7.6. Compute the matrix of the operator F of Lemma (0.8.1. and use the matrix to verify the 
formula for its trace. 


Section9 Representations of SU» 


Ue Calculate the four- -dime nsional volume of the 4-ball 3“ of radius r in R*. the locus 
NA+: + AR ~ 7. by slicing with three-dimensional slices. Check your answer by 
( — 


9,2. Verity the associative law [Q[Pf]] = [(OP)f] tor the operation (10.9.3). 

9,3. Prove that the orthogonal representation (9.4.1) SU's = SQO3 1s irreducible. 

9.4. Left multiplication defines a representation of SU» on the space R* with coordinates 
— v3. as in Section 9.3. Decompose the associated complex representation into 
irreducible representations. 

9,8, Use Theorem 10.9.14 to determine the irreducible representations of the rotation group 
SO3. 

9.6. (representations of the circle group) All representations here are assumed to be differen- 
tiable functions of @. Let G be the circle group {e'?}. 

(a) Let o be a representation of G ona vector space V. Show that there exists a positive 
definite G-invariant Hermitian form on V. 

(b) Prove Maschke’s Theorem for G. 

(c) Describe the representations of G in terms of one-parameter groups. and use that 
description to prove that the irreducible representations are one-dimensional. 

(d) Verity the orthogonality relations, using an analogue ot the Hermitian product 


(10.9.6). 


9.7. Using the results of Exercise 9.6, determine the irreducible representations of the 
orthogonal group O>. 


Miscellaneous Problems 


M.1. The representations in this problem are real. A molecule VW in ‘Flatland’ (a twe- 
dimensional world) consists of three like atoms a), a2. a3 forming a triangle. The triangle 
is equilateral at time fp, its center is at the origin, and a, is on the positive v-axis. The group 
G of symmetries of Vf at time fo is the dihedral group Ds. We list the velocities of the 
individual atoms at f and call the resulting six-dimensional vector v¥ = (Uy. U2, Ux)! the 
state of M. The operation of G on the space V of state vectors defines a six-dimensional 
matrix representation S. For example. the rotation ¢ by 2:7 3 about the origin permutes 
the atoms cyclically, and at the same time it rotates them. 


(a) Let 7 be the reflection about the v-axis. Determine the matrices Sis awa) S)-. 

(b) Determine the space W of vectors fixed by S,. and show that W is G-invariant. 

(¢) Decompose Wand V explicitly into direct sums of irreducible G-invariant subspaces. 
(d) Explain the subspaces tound in (c) in terms of motions and vibrations of the melecule. 


M.2. 


M.3. 


M.4. 


M.5. 


M.6. 


M.7. 
MLS. 


*M_.9, 


*M.10. 
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What can be said about a group that has exactly three irreducible characters, of dimensions 
1, 2, and 3, respectively? 


Pet p be a representation of a group G. In each of the following cases, decide whether or 
not ¢ 1s a representation, and whether or not it is necessarily isomorphic to p. 


(a) x is a fixed element of G, and Pg = Prot 


(b) gv is an automorphism of G, and (ep = Pyig): 
(c) o is a one-dimensional representation of G, and De = Og(g. 


Prove that an element z of a group G is in the center of G if and only if for all irreducible 
representations ¢, (z) is multiplication by a scalar. 


Let A, B be commuting matrices such that some positive power of each matrix is the 
identity. Prove that there is an invertible matrix P such that PAP™! and PBP“! are both 
diagonal. 


Let ¢ be an irreducible representation of a finite group G. How unique is the positive 
definite G-invariant Hermitian form? 


Describe the conmmutator subgroup of a group G in terms of the character table. 


Prove that a finite simple group that is not of prime order has no nontrivial representation 
of dimension 2. 


Let H be a subgroup of index 2 of a finite group G. Let a be an element of G that is not 
in H,so that H and aH are the two cosets of H. 


(a) Given a matrix representation S: H —- GL, of the subgroup H, the induced 
representation ind S: G — GL, of the group G is defined by 


(ind S)p, = ee come , (indS)g= |Sinte | 
for h in H and g in GH. Prove that ind S is a representation of G, and describe its 
character. 
Note: The element a~'ha will be in H, but because a is not in H, it needn’t be a 
conjugate of h in H. 

(b) If R:G > GLy is a matrix representation of G, we may restrict it to H. We denote 
the restriction by resR: H -> GLy. Prove that res(ind S) = S ® S’, where S’ is the 
conjugate representation defined by S, = Sg-tna: 

(c) Prove Frobenius reciprocity: (Xinds, XR) = (XS> XresR) 

(d) Let S be anirreducible representation of H. Use Frobenius reciprocity to prove that if 
S not isomorphic to the conjugate representation S’, then the induced representation 
ind S is irreducible, and on the other hand, if S and S’ are isomorphic, then ind Sis a 
sum of two non-isomorphic representations of G. 


Let H be a subgroup of index 2 of a group G, and let R be a matrix representation of G. 


Let R’ denote the representation defined by R’, = Rg if g ¢ H, and R, = ~-Rg otherwise. 


(a) Show that R’ is isomorphic to R if and only if the character of R is identically zero on 
the coset gH not equal to H. . . 
(b) Use Frobenius reciprocity (Exercise M.9) to show that ind(res R) = R © R’. 
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*M.11. 


*M.12. 


M.13. 
M.14. 


M.15. 


(c) Suppose that R is irreducible. Show that if R is not isomorphic to R’, then res R is 
irreducible, and if these two representations are isomorphic, then res R is a sum of 
two irreducible representations of H. : 


Derive the character table of S, using induced representations from A, when 
@n=3) G7 =4, (jn =. 

Derive the character table of the dihedral group D,, using induced representations 

froma 

Let G be a finite subgroup of GL, (C). Prove that if aE trace g = 0, then sae 20) 


Let o:G — GL(V) be a two-dimensional representation of a finite group G, and 
assume that | is an eigenvalue of pg for every g in G. Prove that p is a sum of two 
one-dimensional representations. 


Let p:G — GL, (C) be an irreducible representation of a finite group G. Given a 
representation 0:GL, — GL(V) of GLy, we can consider the composition o o p as a 
representation of G. 


(a) Determine the character of the representation obtained in this way when a is left 
multiplication of GL» on the space V of m Xn matrices. Decompose o o ¢ into 
irreducible representations in this case. 


(b) Determine the character of 0 o p when is the operation of conjugation on C”*". 


GelleAeP T4ER 11 


Rings 


Bitte vergiB alles, was Du auf der Schule gelernt hast; 
denn Du hast es nicht gelernt. 


—Edmund Landau 


11.1 DEFINITION OF A RING 


Rings are algebraic structures closed under addition, subtraction, and multiplication, but not 
under division. The integers form our basic model for this concept. 

Before going to the definition of a ring, we look at a few examples, subrings of the 
complex numbers. A subring of C is a subset which is closed under addition, subtraction and 
multiplication, and which contains 1. 


¢ The Gauss integers , the complex numbers of the form a + bi, where a and b are integers, 
form a subring of C that we denote by Z[i}: 


(11.1.1) Zi] = {a + bi| a, be Z}. 


Its elements are the points of a square lattice in the complex plane. 


We can form a subring Z[a] analogous to the ring of Gauss integers, starting with any 
_ complex number a: the subring generated by a. This is the smallest subring of C that contains 
a, and it can be described in a general way. If a ring contains a, then it contains all positive 
powers of a because it is closed under multiplication. It also contains sums and differences 
of such powers, and it contains 1. Therefore it contains every complex number # that can 
be expressed as an integer combination of powers of a, or, saying this another way, can be 
obtained by evaluating a polynomial with integer coefficients at a: 


Glidsie2) B=ana" +---+a,;a+a 9, where a; arein Z. 


On the other hand, the set of all such numbers is closed under the operations +, —, and x, 
and it contains 1. So it is the subring generated by a. 

In most cases, Z[a] will not be represented as a lattice in the complex plane. For 
example, the ring Z[5] consists of the rational numbers that can be expressed as a polynomial 
in , with integer coefficients. These rational numbers can be described simply as those whose 
denominators are powers of 2. They form a dense subset of the real line. 
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¢ A complex number a@ is algebraic if it is a root of a (nonzero) polynomial with integer 
coefficients — that is, if some expression of the form (11.1.2) evaluates to zero. If there is no 
polynomial with integer coefficients having @ as a root, @ is transcendental. The numbers e 
and zr are transcendental, though it isn’t very easy to prove this. 


When o is transcendental. two distinct polynomial expressions (11.1.2) represent distinct 
complex numbers. Then the elements of the ring Z[a] correspond bijectively to polynomials 
p(x) with integer coefficients, by the rule p(x) ~» p(a). When aq is algebraic there will be 
many polynomia! expressions that represent the same complex number. Some examples of 
algebraic numbers are: i+3, 1/7, 7+ /2, and /3 + J-5. 


The definition of a ring is similar to that of field (3.2.2). The only difference is that 
multiplicative inverses aren’t required: 


Definition 11.1.5 (+. -.°<, 1) Aring Ris aset with two laws of composition + and *. called 
addition and multiplication, that satisfy these axioms: 


(a) With the iaw of composition +, R is an abelian group that we denote by R*; its identity 
is denoted by 0. 

(b) Multiplicavon is commutative and associative, and has an identity denoted by 1. 

(c) distributive law: For alla, b, and cin R, (a+ b)c =ac+ be. 


A subring of a ring ts a subset that is closed under the operations of addition, subtraction, 
and multiplication and that contains the element 1. 


Note: There is a related concept, of a noncommutative ring — a structure that satisfies all 
axioms of (11.1.3) except the commutative law for multiplication. The set of all real n xn 
Matrices is one example. Since we won't be studying noncommutative rings, we use the word 
“ring” to mean “‘commutative ring.” O 


Asiae trom subrings of C, the most important rings are polynomial rings. A polynomial 
in x with coefficients in a ring R is an expression of the form 


(11.1.4)  Anx™ +--+ a,x+ ap, 


with «, in &. The set of these polynomials forms a ring that we discuss in the next section. 


Another exampte: The set ® of continuous real-valued functions of a real variable x 
forms a ring. with addition and multiplication of functions: [f + g](x) = f(x) + g(«) and 


(fel) =f Gee: 


here is a ring that contains just one element, 0; it is called the zero ring. In the 
definition cf a field (3.2.2), the set F* obtained by deleting 0 is a group that contains the 
multiplicative identity 1. So 1 is not equal to 0 in a field. The relation 1 = 0 hasn’t been ruled 
out in a ring, but it occurs only once: 


Proposition 11.1.5 A ring & in which the elements 1 and 0 are equal is the zero ring. 
Proof. We first note that 0a = 0 for every element a of a ring R. The proof is the same as 


for vector spaces: ) = 0a — 0a = (0 — 0)a = Oa. Assume that 1 = 0 in R, and let a be any 
element. Then a = 1a = 0a = 0. The only element of RisO. D0 
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Though elements of a ring aren’t required to have multiplicative inverses, a particular 
element may have an inverse, and the inverse is unique if it exists. 


¢ A unit of a ring is an element that has a multiplicative inverse. 


The units in the ring of integers are 1 and -1, and the units in the ring of Gauss integers 
are +1 and +1. The units in the ring R[x] of real polynomials are the nonzero constant 
polynomials. Fields are rings in which 0# 1 and in which every nonzero element is a unit. 

The identity element 1 of a ring is always a unit, and any reference to “the” unit 
element in R refers to the identity element. The ambiguous term “unit” is poorly chosen, 
but it is too late to change it. 


11.2 POLYNOMIAL RINGS 


¢ A polynomial with coefficients in a ring R is a (finite) linear combination of powers of thu 
variable: 


G11.2.1) i ee ean, 


where the coefficients a; are elements of R. Such an expression is sometimes called a formal 
polvnomial, to distinguish it from a polynomial function. Every formal polynomial with real 
coefficients determines a polynomial function on ihe real numbers. But we use the word 
polynomial to mean formal polynomial. 
The set of polynomials with coefficients in a ring K will be denoted by Ax]. Thus Zj.x} 
denotes the set of polynomials with integer coefficients — the set of integer polynomials. 
The monomials x! are considered independent. So if 


(11.2.2) 2(X) = Dy x” + Dy 1x) +--+ dix + bo 


is another polynomial with coefficients in R, then f(x) and g(x) are equal if and only if 
ee OT ald = Oe oon oc: 


° The degree of a nonzero polynomial, which may be denote«! by deg f. is the largest integer 
n such that the coefficient a, of x, is not zero. A polynomial of degree zero is called a 
constant polynomial. The zero polynomial is also called a constant polynomial, but its degree 
will not be defined. 

The nonzero coefficient of highest degree of a polynomial is 1ts leading coefficient, and 
a monic polynomial is one whose leading coefficient is 1. 


The possibility that some coefficients of a polynomial may be zero creates a nuisance. 
We have to disregard terms with zero coefficient, so the polynomial f(x) can be written 
in more than one way. This is irritating because it isn’t an interesting point. One way to 
avoid ambiguity is to imagine listing the coefficients of all monomiais, whether zero or not. 
This allows efficient verification of the ring axioms. So for the purpose of defining the ring 
operations, we write a polynomial as 


(11.2.3) f(x) =antayxt+aix’+--, 
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where the coefficients a; are ali in the ring R and only finitely many of them are different 
from zero. This polynomial is determined by its vector (or sequence) of coefficients a;: 


(11.2.4) a=(aoudin-: =); 


where a; are elements of R, all but a finite number zero. Every such vector corresponds to a 
polynomial. 

When R is a field, these infinite vectors form the vector space Z with the infinite 
basis e; that was defined in (3.7.2). The vector e; corresponds to the monomial x’, and the 
monomials form a basis of the space of all polynomials. 


The definitions of addition and multiplication of polynomials mimic the familiar 
operations on polynomial functions. If f(x) and g(x) are polynomials, then with notation 
as above, their sum is 


(125) F(x) + B(x) = (an + bo) + (a+ byxt--- = Vat dy)x*, 
k 


where the notation (a; + b;) refers to addition in R. So if we think of a polynomial as a 
vector, addition is vector addition: a+ b= (ag + bo, a, + b1,...). 
The product of polynomials f and g is computed by expanding the product: 


(11.2.6) S(x)g(x) = (ao + Qj,xX "=~ )(bo + byx +.. :) = pee 
i,j 


where the products a;b; are to be evaluated in the ring R. There will be finitely many 
nonzero coefficients a;b;. This is a correct formula, but the right side is not in the standard 
form (11.2.3), because the same monomial x” appears several times — once for each pair i, j 
of indices such that 7 + 7 = n. So terms have to be collected on the right side. This leads to 
the definition 


(11.2.7) f(x) g(x) = po + pix t+ pox? +---, 
with Pr= YD) aibj, 
i+ j=k 


Po = abo, A= agb; + a;bo, P2 = aob2 +a,b, +. arbo, . 


Each px is evaluated using the laws of composition in the ring. However, when making 
computations, it may be desirable to defer the collection of terms temporarily. 


Proposition 11.2.8 There is a unique commutative ring structure on the set of polynomials 
R[x] having these properties: 

e Addition of polynomials is defined by (11.2.5). 

e Multiplication of polynomials is defined by (11.2.7). 


e The ring R becomes a subring of R[x] when the elements of R are identified with 
the constant polynomials. 
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Since polynomial algebra is familiar and since the proof of this proposition has no interesting 
features, we omit it. Oo 

Division with remainder is an important operation on polynomials. 
Proposition 11.2.9 Division with Remainder. Let R be a ring, let f be a monic polynomial 
and let g be any polynomial, both with coefficients in R. There are uniquely determined 
polynomials g and r in R[x] such that 
&(x) = fx)q*) +r), 

and such that the remainder r, if it is not zero, has degree less than the degree of f. Moreover, 


f divides g in R[x] if and only if the remainder r is zero. 


The proof of this proposition follows the algorithm for divisioi of polynomials that one 

learns in school. a) 

Corollary 11.2.10 Division with remainder can be done whenever the leading coefficient of 

J is a unit. In particular, it can be done whenever the coefficient ring is a field and f 40. 

If the leading coefficient is a unit u, we can factor it out of f. O 
However, one cannot divide x? + 1 by 2x +1 in the ring Z[x] of integer polynomials. 

Corollary 11.2.11 Let g(x) be a polynomial in R[x], and let w be an element of R. The 


remainder of division of g(x) by x — @ is g(a). Thus x — @ divides g in R[x] if and only if 
g(a) =0. 

This corollary is proved by substituting x = a into the equation g(x) = (x — a)q(x) +rand 
noting that r is a constant. O 


Polynofials are fundamental to the theory of rings, and we will also want to use 
polynomials in several variables. There is no major change in the definitions. 


e A monomial is a formal product of some variables x1, ..., Xp of the form 
Bi een 


where the exponents i, are non-negative integers. The degree of a monomial, sometimes 
called the total degree, is the sum i; + +--+ In. 


An n-tuple (i1,..-., in) is called a multi-index, and vector notation 7 = (i;,...,in) 
for multi-indices is convenient. Using multi-index notation, we may write a monomial 


symbolically as x’: 
(11.2.12) Por ey ape 


The monomial x°, with 0 = (0,...,0), is denoted by 1. A polynomial in the variables 
X1,.-., Xn, With coefficients in a ring R, is a linear combination of finitely many monomials, 


328 Chapter 11 Rings 


with coefficients in R. With multi-index notation, a polynomial f(x) = f(%1,...,*%n) can 
be written in exactly one way in the form 
(11.2.13) . IO=> a 
i 
where i runs through all multi-indices (i;,..., in), the coefficients a; are in R, and only 


finitely many of these coefficients are different from zero. 

A polynomial in which all monomials with nonzero coefficients have (total) degree d 
is called a homogeneous polynomial. 

Using multi-index notation, formulas (11.2.5) and (11.2.7) define addition and multi- 
plication of polynomials in several variables, and the analogue of Proposition 11.2.8 is true. 
However, division with remainder requires more thought. We will come back to it below 
(see Corollary 11.3.9). 

The ring of polynomials with coefficients in R is usually denoted by one of the symbols 


(11.2.14) R[x. oe | or Rix 


where the symbol x is understood to refer to the set of variables {x;,..., Xn}. When no set 
of variables has been introduced, R[x] denotes the polynomial ring in one variable. 


11.3 HOMOMORPHISMS AND IDEALS 


° Aring homomorphism ~:R -> R’ is a map from one ring to another which is compatible 
with the laws of composition and which carries the unit element 1 of R to the unit element 1 
in R’ —a map such that, for all a and bin R, 


(11.3.1) g(ia+b)=g(a)+(b), glab)=g¢(agp(b), and g(1)=1. 
The map 
(11.3.2) g:Z>F, : 


that sends an integer to its congruence class modulo p is a ring homomorphism. 

An isomorphism of rings 1s a bijective homomorphism, and if there is an isomorphism 
from R to R’, the two rings are said to be isomorphic. We often use the notation R ~ R’ to 
indicate that two rings R and R’ are isomorphic. 

A word about the third condition of (11.3.1): The assumption that a homomorphism @ 
is compatible with addition implies that it is a homomorphism from the additive group R* 
of R to the additive group R’*. A group homomorphism carries the identity to the identity, 
so g(0) = 0. But we can’t conclude that g(1) = 1 from compatibility with multiplication, 
so that condition must be listed separately. (FR is not a group with respect to X.) For example, 
the zero map R — R’ that sends all elements of R to zero is compatible with + and x, but 
it doesn’t send 1 to 1 unless 1 = 0 in R’. The zero map is not called a ring homomorphism 
unless R’ is the zero ring (see (11.1.5)). 

The most important ring homomorphisms are obtained by evaluating polynomials. 
Evaluation of real polynomials at a real number a defines a homomorphism 


(11.3.3) R[x] > R, thatsends p(x) ~» p(a). 
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One can also evaluate real polynomials at a complex number such as i, to obtain a 
homomorphism R[x] — C that sends p(x) ~» p(i). 


The general formulation of the principle of evaluation of polynomials is this: 


Proposition 11.3.4 Substitution Principle. Let y: R — R’ be a ring homomorphism, and let 
R[x] be the ring of polynomials with coefficients in R. 


(a) Let a be an element of R’. There is a unique homomorphism ®: R[x] > R’ that agrees 
with the map ¢ on constant polynomials, and that sends x ~~ a. 

(b) More generally, given elements a,...,a@, of R’, there is a unique homomorphism 
D: R[xX1,..., Xn] > R’, from the polynomial ring in n variables to R’, that agrees with 
yg on constant polynomials and that sends x, ~~» a@,, forv=1,...,n. 


Proof. (a) Let us denote the image g(a) of an element a of R hy a’. Using the fact that 
® is a homomorphism that restricts to g on R and sends x to a, we see that it acts on a 
polynomial f(x) = > a;x' by sending 


(didu3:5) () ax’) = D> (a) Oo)! = ajo’. 


In words, ® acts on the coefficients of a polynomial as ¢, and it substitutes a for x. Since this 
formula describes ®, we have proved the uniqueness of the substitution homomorphism. 
To prove its existence, we take this formula as the definition of ®, and we show that ® isa 
homomorphism R[x] — R’. It is clear that 1 is sent to 1, and it is easy to verify compatibility 
with addition of polynomials. Compatibility with multiplication is checked using formula 
(1:26): 


(fg) = (So aibjx'*/) =) OGaibjx') = Vi aibial) 
i, J 
= (Saja!) (va!) = O(N). 
i 7 


With multi-index notation, the proof of (b) becomes the same as that of (a). i) 


Here is a simple example of the substitution principle in which the coefficient ring 
R changes. Let yw: R > S be a ring homomorphism. Composing y with the inclusion of 
S as a subring of the polynomial ring S[x], we obtain a homomorphism ¢: R -> S[x]. 
The substitution principle asserts that there is a unique extension of ~ to a homomorphism 
®: R[x] > S[x] that sends x ~+ x. This map operates on the coefficients of a polynomial, 
while leaving the variable x fixed. If we denote w(a) by a’, then it sends a polynomial 
Anx" +---+aj,x+ ao to al,x" +---+a,x+a. 

A particularly interesting case is that g is the homomorphism u — F, that sends an 
integer a to its residue @ modulo p. This map extends to a homomorphism ®:Z[x] > Fp[x], 


defined by 
(11.3.6) F(X) = agx" ++ ag ~~ px" +--+ +a = fF), 
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where @; is the residue class of a; modulo p. It is natural to call the polynomial if (a yathe 
residue of f(x) modulo p. 

Another example: Let R be any ring, and let P denote the polynomial ring R[x]. One 
can use the substitution principle to construct an isomorphism 


(11.3.7) | Rix, y] > Ply] = (REx) (I. 


This is stated and proved below in Proposition 11.3.8. The domain is the ring of polynomials 
in two variables x and y, and the range is the ring of polynomials in y whose coefficients 
are polynomials in x. The statement that these rings are isomorphic is a formalization of the 
procedure of collecting terms of like degree in y ina polynomial f(x, y). For example, 


xty 4x3 -— 3072p 2 = ayer oe 


This procedure can be useful. For one thing, one may end up with a polynomial that is monic 
in the variable y, as happens in the example above. If so, one can do division with remainder 
(see Corollary 11.3.9 below). 


Proposition 11.3.8 Let x = (x1,...,Xm) and'y = (jj,..., yn) denote sets of variables. 
There is a unique isomorphism R[x, y] > R[x][y], which is the identity on R and which 
sends the variables to themselves. 


This is very elementary, but it would be boring to verify compatibility of multiplication in 
the two rings directly. 


Proof. We note that since R is a subring of R[x] and R[x] is a subring of R[x][y], R is also a 
subring of R[x][y]. Let g be the inclusion of R into R[x][y]. The substitution principle tells 
us that there is a unique homomorphism ®: R[x, y] > R[x]|y], which extends g and sends 
the variables x,, and y, wherever we want. So we can send the variables to themselves. 
The map ® thus constructed is the required isomorphism. It isn’t difficult to see that ® is 
bijective. One way to show this would be to use the substitution principle again, to define 
the inverse map. fy 


Corollary 11.3.9 Let f(x, y) and g(x, y) be polynomials in two variables, elements of 
R[x, y]. Suppose that, when regarded as a polynomial in y, f is a monic polynomial 
of degree m. There are uniquely determined polynomials g(x, y) and r(x, y) such that 
g = fq-+r,and such that if r(x, y) is not zero, its degree in the variable y is less than m. 


This follows from Propositions 11.2.9 and 11.3.8. ey 


Another case in which one can describe homomorphisms easily is when the domain is 
the ring of integers. 


Proposition 11.3.10 Let R be a ring. There is exactly one homomorphism g:Z —> R from 
the ring of integers to R. It is the map defined, for n > 0, by p(n) =1+---+1 (n terms) 
and g(-n) = -¢(n). 


Sketch of Proof. Let p:Z — R be a homomorphism. By definition of a homomorphism, 
g(1) = 1 and y(n + 1) = y(n) + G(1). This recursive definition describes y on the natural 
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numbers, and together with p(-n) = -g(n) ifn > 0 and g(0) = 0, it determines vy uniquely. 
So it is the only map Z — R that could be a homomorphism, and it isn’t hard to convince 
oneself that it is one. To prove this formally, one would go back to the definitions of addition 
and multiplication of integers (see Appendix). C 


' Proposition (11.3.10) allows us to identify the image of an integer in an arbitrary ring R. 
We interpet the symbol 3, for example, as the element 1 +1 +1 of R. 


¢ Let g:R > R’ bea ring homomorphism. The kernel of ¢ is the set of elements of R that 
map to zero: 


(11.3.11) kery = {se R| g(s) = 0}. 


This is the same as the kernel obtained when one regards y as a homomorphism of additive 
groups R*+ > R’*. So what we have learned about kernels of group homomorphisms 
applies. For instance, ¢ is injective if and only if ker g = {0}. 

As you will recall, the kernel of a group homomorphism is not only a subgroup, it 
is a normal subgroup. Similarly, the kernel of a ring homomorphism is closed under the 
operation of addition, and it has a property that is stronger than closure under multiplication: 


(183.12) If sisin kerg, then for every element r of R, rs isin kerg. 


For, if p(s) = 0, then g(rs) = g(r) e(s) = g(r)0 = 0. 
This property is abstracted in the concept of an ideal. 


Definition 11.3.13. An ideal J of a ring R is anonempty subset of R with these properties: 


e J is closed under addition, and 
e If sisin J andrisin R, then rs is in J. 


The kernel of a ring homomorphism is an ideal. 

The peculiar term “ideal” is an abbreviation of the phrase “ideal element” that was 
formerly used in number theory. We will see in Chapter 13 how it arose. A good way, 
probably a better way, to think of the definition of an ideal is this equivalent formulation: 


I is not empty, and a linear combination 7151 + ---+7%5, 


(11.3.14) of elements s; of J with coefficients 7; in R is in J. 


¢ In any ring R, the multiples of a particular element a form an ideal called the principal 
ideal generated by a. An element b of R is in this ideal if and only if b is a multiple of a, 
which is to say, if and only if a divides b in R. 


There are several notations for this principal ideal: 
(11.3.15) (a) =aR = Ra = {ra|re R}. 


The ring R itself is the principal ideal (1), and because of this it is called the unit ideal. 
It is the only ideal that contains a unit of the ring. The set consisting of zero alone is the 
principal ideal (0), and is called the zero ideal. An ideal J is proper if it is neither the zero 
ideal nor the unit ideal. 
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Every ideal / satisfies the requirements for a subring, except that the unit element | of 
R will not be in / unless / is the whole ring. Unless J is equal to R, it will not be what we call 
a subring. 


Examples 11.3.16 


(a) Let y be the homomorphism R[x] > R defined by substituting the real number 2 for x. 
Its kernel, the set of polynomials that have 2 as a root, can be described as the set of 
polynomials divisible by x — 2. This is a principal ideal that might be denoted by (x — 2). 


(b) Let ®:R[x, y] > R[t] be the homomorphism that is the identity on the real numbers, and 
that sends x ~ #2, yf. Then it sends g(x, y) ~ g(t”, f°). The polynomial f(x, y) = 
y? — x3 is in the kernel of ®. We’ll show that the kernel is the principal ideal (f) 
generated by f, ie., that if g(x, y) is a polynomial and if g(t?,) = 0, then f 
divides g. To show this, we regard f as a polynomial in y whose coefficients are 
polynomials in x (see (11.3.8)). It is a monic polynomial in y, so we can do division 
with remainder: g = fq +r, where g and r are polynomials, and where the remainder 
r, if not zero, has degree at most 1 in y. We write the remainder as a polynomial in 
Vea) == 1) VE ro Geet g(t’, 3) = 0, then both g and fq are in the kernel of ®, 
so r is too: r(t?, 2) = r)(t2)1 + ro(t2) = 0. The monomials that appear in ro(t?) have 
even degree, while those in 7; (t7)t? have odd degree. Therefore, in order for r(t2, t°) to 
be zero, ro(x) and r; (x) must both be zero. Since the remainder is zero, f divides g. 0 


The notation (a) for a principal ideal is convenient, but it is ambiguous because the ring 
isn’t mentioned. For instance, (x — 2) could stand for an ideal of R[x] or of Z[x], depending 
on the circumstances. When several rings are being discussed, a different notation may be 
preferable. 


¢ The ideal J generated by a set of elements {a,,..., Gn} of aring R is the smallest ideal that 
contains those elements. It can be described as the set of all linear combinations 


(1347) r1{Q, +++++1nan 
with coefficients 7; in the ring. This ideal is often denoted by (aj, ..., ay): 
(11.3.18) (aj,...,4n) = {r1a, +---+1nan | ri € R}. 


For instance, the kernel K of the homomorphism ¢~: Z[x] > Fp that sends f(x) to 
the residue of f(0) modulo p is the ideal (p, x) of Z[x] generated by p and x. Let’s check 
this. First, p and x are in the kernel, so (p, x) C K. To show that K C (p, x), we let 
(Xx) = Anx" +-+++a,x + apo be an integer polynomial. Then f(0) = ao. If ag =0 modulo P. 
say ay = bp, then f is the linear combination bp + (anx”~! + --. +a ,)x of p and x. So f 
is in the ideal (p, x). 

The pes of ECs required to generate an ideal can be arbitrarily large. 
The ideal (x°, x*y, xy’, y’) of the polynomial ring C[x, y] consists of the polynomials 
in which every term has degree at least 3. It cannot be a by fewer than four 
elements. 


In the rest of this section, we describe ideals in some simple cases. 
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Proposition 11.3.19 


(a) The only ideals of a field are the zero ideal and the unit ideal. 
(b) A ring that has exactly two ideals is a field. 


Proof. If an ideal J of a field F contains a nonzero element a, that element is invertible. 
Then J contains a~!a = 1, and is the unit ideal. The only ideals of F are (0) and (1). 

Assume that R has exactly two ideals. The properties that distinguish fields among 
rings are that 10 and that every nonzero element a of R has a multiplicative inverse. We 
have seen that 1 = 0 happens only in the zero ring. The zero ring has only one ideal, the zero 
ideal. Since our ring has two ideals, 140 in R. The two ideals (1) and (0) are different, so 
they are the only two ideals of R. 

To show that every nonzero element a of R has an inverse, we consider the principal 
ideal (a). It is not the zero ideal because it contains the element a. Therefore it is the unit 
ideal. The elements of (a) are the multiples of a, so 1 is a multiple of a, and therefore a is 
invertible. O 


Corollary 11.3.20 Every homomorphism g: F > R from a field F to a nonzero ring R is 
injective. 


Proof. The kernel of g is an ideal of F’. So according to Proposition 11.3.19, the kernel is 
either (0) or (1). If ker@ were the unit ideal (1), g would be the zero map. But the zero 
map isn’t a homomorphism when R isn’t the zero ring. Therefore kerg = (0), and ¢ is 
injective. 


Proposition 11.3.21 The ideals in the ring of integers are the subgroups of Z*, and they are 
principal ideals. 


An ideal of the ring Z of integers will be a subgroup of the additive group Z*. It was proved 
before (2.3.3) that every subgroup of Z* has the form Zn. é Oo 


The proof that subgroups of Z* have the form Zn can be adapted to the polynomial 
ring F[x]. 


Proposition 11.3.22 Every ideal in the ring F[x] of polynomials in one variable x over a 
field F is a principal ideal. A nonzero ideal / in F[x] is generated by the unique monic 
polynomial of lowest degree that it contains. 


Proof, Let I be an ideal of F[x]. The zero ideal is principal, so we may assume that / is not 
the zero ideal. The first step in finding a generator for a nonzero subgroup of Z is to choose 
its smallest positive element. The substitute here is to choose a nonzero polynomial f in J 
of minimal degree. Since F is a field, we may choose f to be monic. We claim that ! is the 
principal ideal (f) of polynomial multiples of f. Since f isin I, every multiple of “s is in e, 
so (f) C J. To prove that J C (f), we choose an element g of /, and we use division with 
remainder to write g = fq +r, where r, if not zero, has lower degree than f. Since g and f 
are in J, g — fg =risin / too. Since f has minimal degree among nonzero elements of /, 
the only possibility is that r = 0. Therefore f divides g, and g isin (f). 
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If f; and f> are two monic polynomials of lowest degree in J, their difference is in [ 
and has lower degree than 7, so it must be zero. Therefore the monic polynomial! of lowest 
degree is unique. O 


Example 11.3.23 Let y = <2 be the real cube root of 2, and let ® : Q[x] > C be the 
substitution map that sends x ~» y. The kernel of this map is a principal ideal, generated by 
the monic polynomial of lowest degree in Q[x] that has y as a root (11.3.22). The polynomial 
x3 — 2 is in the kernel, and because ~/2 is not a rational number, it is not the product 
f = gh of two nonconstant polynomials with rational coefficients. So it is the lowest degree 
polynomial in the kernel, and therefore it generates the kernel. 

We restrict the map © to the integer polynomial ring Z[x], obtaining a homomorphism 
’: Z[x] — C. The next lemma shows that the kernel of ®’ is the principal ideal of Z[x] 
generated by the same polynomial /. 


Lemma 11.3.24 Let f be a monic integer polynomial, and let g be another integer polynomial. 
If f divides g in Q[x], then f divides g in Z[x]. 


Proof. Since f is monic, we can do division with remainder in Z[x]: g = fq+pr. This 
equation remains true in the ring Q[-x], and division with remainder in Q|[x] gives the same 
result. In Q[x], f divides g. Therefore r = 0, and f divides g in Z[x]. 


The proof of the following corollary is similar to the proof of existence of the greatest 
common divisor in the ring of integers ((2.3.5), see also (12.2.8)). 


Corollary 11.3.25 Let R denote the polynomial ring F[x] in one variable over a field F, 
and let f and g be elements of R, not both zero. Their greatest common divisor d(x) is the 
unique monic polynomial that generates the ideal (f, g). It has these properties: 


(a) Rd = Rf + Re. 

(b) d divides f and g. 

(c) Ifa polynomial e = e(x) divides both f and g, it also divides d. 

(d) There are polynomials p and q such that d = pf + qg. Oo 


The definition of the characteristic of a ring R is the same as for a field. It is the 
non-negative integer n that generates the kernel of the homomorphism g:Z —> R (11.3.10). 
If = 0, the characteristic is zero, and this means that no positive multiple of 1 in R is equal 
to zero. Otherwise n is the smallest positive integer such that ‘‘n times 1” is zero in R. The 
characteristic of a ring can be any non-negative integer. 


11.4 QUOTIENT RINGS 


Let / be an ideal of a ring R. The cosets of the additive subgroup J* of Rt are the subsets 
a+ I. It follows from what has been proved for groups that the set of cosets R = R/lisa- 
group under addition. It is also a ring: 
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Theorem 11.4.1 Let I be an ideal of a ring R. There is a unique ring structure on the set 
R of additive cosets of J such that the map 7: R — R that sends aa = [a + J] is a ring 
homomorphism. The kernel of z is the ideal J. 


As with quotient groups, the map 77 is referred to as the canonical map, and R is called 
the quotient ring. The image G of an element a is called the residue of the element. 


Proof. This proof has already been carried out for the ring of integers (Section 2.9). We 
want to put a ring structure on R, and if we forget about multiplication and consider only 
the addition law, J becomes a normal subgroup of Rt, for which the proof has been given 
(2.12.2). What is left to do is to define multiplication, to verify the ring axioms, and to prove 
that z is a homomorphism. Let a = [a + J] and b = [b + I] be elements of R. We would 
like to define the product by the setting ab = [ab + I]. The set of products 


P=(a+D(b+D ={rs|reat+lseb+ 


isn’t always a coset of J. However, as in the case of the ring of integers, P is always contained 
in the coset ab + J. If we writer =a+uands=b+vwithu and vin J, then 


(a+u)(b+v) =ab+ (av+but+uv). 


Since J is an ideal that contains u and v, it contains av + bu + uv. This is all that is needed 
to define the product coset: It is the coset that contains the set of products. That coset is 
unique because the cosets partition R. 

The proofs of the remaining assertions follow the patterns set in Section 2.9. O 


As with groups, one often drops the bars over the letters that represent elements of a 
quotient ring R, remembering that ‘“‘a = bin R” means a = b. 


The next theorems are analogous to ones that we have seen for groups: 


Theorem 11.4.2 Mapping Property of Quotient Rings. Let f:R — R’ be a ring homomor- 
phism with kernel K and let / be another ideal. Let 7: R — K be the canonical map from 
RtoR=R/I. aed — 

(a) If 7C K, there is a unique homomorphism f:R — R’ such that fa = f: 


R R’ 
4 
va 
a ai 
R=R/I 
(b) (First Isomorphism Theorem) If f is surjective and J = K, f is an isomorphism. O 


The First Isomorphism Theorem is our fundamental method of identifying quotient 
rings. However, it doesn’t apply very often. Quotient rings will be new rings in most cases, and 
this is one reason that the quotient construction is important. The ring Clee, vl y* =e el), 
for example, is completely different from any ring we have seen up to now. Its elements are 
functions on an elliptic curve (see [Silverman]). 
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The Correspondence Theorem for rings describes the fundamental relationship be- 
tween ideals in a ring and a quotient ring. 


Theorem 11.4.3 Correspondence Theorem. Let gy: R > R be a surjective rmg homomor- 
phism with kernel K. There is a bijective correspondence between the set of all ideals of R 
and the set of ideals of R that contain K: 


{ideals of R that contain K} <—> {ideals of 72}. 


This correspondence is defined as follows: 


¢ If / isa ideal of R and if K C J, the corresponding ideal of R is g(J). 
° If Z isa ideal of R, the corresponding ideal of R is gy '(Z). 


If the ideal J of R corresponds to the ideal TZ of R, the quotient rings R//J and R/T are 
naturally isomorphic. 


Note that the inclusion K C / is the reverse of the one in the mapping property. 


Proof of the Correspondence Theorem. We \et T be an ideal of R and we let J be an ideal 
of R that contains K. We must check the following points: 


e g(J) is an ideal of R. 
¢ g '(Z) is an ideal of R, and it contains K. 


° oy \(Z)) =T, and g !(~()) = 1. 
e If g() =T, then R/I>R/T. 


We go through these points in order, referring to the proof of the Correspondence Theorem 
2.10.5 for groups when it applies. We have seen before that the image of a subgroup is a 
subgroup. So to show that g(/) is an ideal of R, we need only prove that it is closed under 
multiplication by elements of R. Let r be in R and let x be in g(J). Then x = g(x) for some 
x in J, and because ¢ is surjective, 7 = g(r) for some r in R. Since / is an ideal, rx is in J, 
and rx = g(rx), so rx isin g(J). 

Next, we verify that yp '(Z) is an ideal of R that contains K. This is true whether or 
not @ is surjective. Let’s write y(a) = G. By definition of the inverse image, a is in g !(Z) 
if and only if @ is in Z. If ais in g@ !(Z) and ris in R, then g(ra) = 7a is in T because T is 
an ideal, and hence ra is in gy !(Z). The facts that gy !(Z) is closed under sums and that it 
contains K were shown in (2.10.4). 

The third assertion, the bijectivity of the correspondence, follows from the case of a 
group homomorphism. 

Finally, suppose that an ideal / of R that contains K corresponds to an ideal J of R, 
that is, J = g(J) and] =@!(Z). Letz:R > R/T be the canonical map, and let f denote 
the composed map 79: R > R > R/T. The kernel of f is the set of elements x in R such 
that 7y(x) = 0, which translates to g(x) € IZ, or tox € g !(Z) = I. The kernel of fasel . 
The mapping property, applied to the map f, gives us a homomorphism f : R/J > Gey. 
and the First Isomorphism Theorem asserts that f is an isomorphism. (is) 
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To apply the Correspondence Theorem, it helps to know the ideals of one of the rings. 
The next examples illustrate this in very simple situations, in which one of the two rings is 
C[t]. We will be able to use the fact that every ideal of C[Z] is principal (11.3.22). 


Example 11.4.4 (a) Let g:C[x, y] > C[] be the homomorphism that sends x ~+t and 
y~ ft’. This is a surjective map, and its kernel K is the principal ideal of C[x, y] generated 
by y — x”. (The proof of this is similar to the one given in Example 11.3.16.) 

The Correspondence Theorem relates ideals / of C[x, y] that contain y — x? to ideals 
J of C[z], by J = gC) and J = gy !(J). Here J will be a principal ideal, generated by 
_ a polynomial p(t). Let 1; denote the ideal of C[x, y] generated by y — x? and p(x). 
Then J; contains K, and its image is equal to J. The Correspondence Theorem asserts 
that 7; = J. Every ideal of the polynomial ring C[x, y] that contains y — x? has the form 
I = (y— x’, p(x)), for some polynomial p(x). 


(b) We identify the ideals of the quotient ring R’ = C[#]/(¢? — 1) using the canonical 
homomorphism z : C[t] > R’. The kernel of z is the principal ideal (¢7 — 1). Let 7 be an 
ideal of C[r] that contains ¢* — 1. Then / is principal, generated by a monic polynomial f, 
and the fact that ¢* — 1 isin J means that f divides 2 — 1. The monic divisors of f2 — 1 are: 
1,t—1,t+1 and ¢? — 1. Therefore the ring R’ contains exactly four ideals. They are the 
principal ideals generated by the residues of the divisors of t7 — 1. O 


Adding Relations 


We reinterpret the quotient ring construction when the ideal J is principal, say J = (a). In 
this situation, we think of R = R/J/ as the ring obtained by imposing the relation a = 0 
on R, or of killing the element a. For instance, the field F7 will be thought of as the ring 
obtained by killing 7 in the ring Z of integers. ; 

Let’s examine the collapsing that takes place in the map 7: R > R. Its kernel is the 
ideal J, so a is in the kernel: (a) = 0. If b is any element of R, the elements that have the 
same image in R as b are those in the coset b + J, and since J = (a) those elements have 
the form b + ra. We see that imposing the relation a = 0 in the ring R forces us also to set 
b = b+ra for all b and r in R, and that these are the only consequences of killing a. 

Any number of relations a, = 0, ..., @, = 0 can be introduced, by working modulo 
the ideal J generated by aj, ..., Gn, the set of linear combinations ra; + --- + rndp, with 
coefficients r; in R. The quotient ring R = R/J/ is viewed as the ring obtained by killing the 
n elements. Two elements b and b’ of R have the same image in R if and only if b’ has the 
form b+ rja; +-:-+/rn@n for some 7; in R. 

The more relations we add, the more collapsing takes place in the map z. If we add 
relations carelessly, the worst that can happen is that we may end up with / = Rand R= 0. 
All relations a = 0 become true when we collapse R to the zero ring. 

Here the Correspondence Theorem asserts something that is intuitively clear: Intro- 
ducing relations one at a time or all together leads to isomorphic results. To spell this out, 
let a and b be elements of a ring R, and let R = R/(a) be the result of killing a in R. Let b 
be the residue of b in R. The Correspondence Theorem tells us that the principal ideal (b) 
of R corresponds to the ideal (a, b) of R, and that R/(a, b) is isomorphic to R/(d). Killing 
aand bin R at the same time gives the same result as killing b in the ring R that is obtained 


- by killing a first. 
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Example 11.4.5 We ask to identify the quotient ring R = Z[i]/(i—2), the ring obtained from 
the Gauss integers by introducing the relation i — 2 = 0. Instead of analyzing this directly, 
we note that the kernel of the map Z[x] > Zi] sending x ~» i is the principal ideal of Z[x] 
generated by f = x? + 1. The First Isomorphism Theorem tells us that Z[x]/(f) ~ Z[i]. The 
image of g = x —2 isi —2,so R can also be obtained by introducing the two relations f = 0 
and g = 0 into the integer polynomial ring. Let / = (f, g) be the ideal of Z[x] generated by 
the two polynomials f and g. Then R¥Z[x]/J. 

To form R, we may introduce the two relations in the opposite order, first killing g, 
then f. The principal ideal (g) of Z[x] is the kernel of the homomorphism Z[{x] — Z that 
sends x ~» 2. So when we kill x — 2 in Z[x], we obtain a ring isomorphic to Z, in which the 
residue of x is 2. Then the residue of f = x? + 1 becomes 5. So we can also obtain R by 
killing 5 in Z, and therefore R ~ Fs. 

The rings we have mentioned are summed up in this diagram: 


kill 
(11.4.6) 2x) 22> 2 


kill kill 
x2 +1 5 


VAT 
Ui] kill Fs 
i-2 O 


11.5 ADJOINING ELEMENTS 


In this section we discuss a procedure closely related to that of adding relations: adjoining 
new elements to a ring. Our model for this procedure is the construction of the complex 
number field from the real numbers. That construction is completely formal: The complex 
number i has no properties other than its defining property: i2 = -1. We will now describe 
the general principle behind this construction. We start with an arbitrary ring R, and consider 
the problem of building a bigger ring containing the elements of R and also a new element, 
which we denote by a. We will probably want @ to satisfy some relation such as a + 1 = 0. 
A ring that contains another ring as a subring is called a ring extension. So we are looking 
for a suitable extension. 

Sometimes the element a may be available in a ring extension R’ that we already know. 
In that case, our solution is the subring of R’ generated by R and q, the smallest subring 
containing R and a. The subring is denoted by R[a]. We described this ring in Section 11.1 in 
the case R = Z, and the description is no different in general: R[a] consists of the elements 
B of R’ that have polynomial expressions 


B=rma" +---+ra+79 


with coefficients 7; in R. 

But as happens when we construct C from R, we may not yet have an extension 
containing @. Then we must construct the extension abstractly. We start with the polynomial 
ring R[x]. It is generated by R and x. The element x of satisfies no relations other than those 
implied by the ring axioms, and we will probably want our new element @ to satisfy some 
relations. But now that we have the ring R[x] in hand, we can add relations to it using the 
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procedure explained in the previous section on the polynomial ring R[x]. The fact that R is 
replaced by R[x] complicates the notation, but aside from this, nothing is different. 

For example, we construct the complex numbers by introducing the relation x7 +1 = 0 
into the ring P = R[x] of real polynomials. We form the quotient ring P= P/(x? +1), and 
the residue of x becomes our element i. The relation X2 + 1 = 0 holds in P because the map 
zt: P + Pisahomomorphism and because x? + 1 is in its kernel. So P is isomorphic to C. 


In general, say that we want to adjoin an element @ to a ring R, and that we want @ to 
satisfy the polynomial relation f(x) = 0, where 


(11:54) SX) = apX"+ an 4x" +-.-+a)x +49,” with a; in R. 


The solution is R’ = R[x]/(f), where (f) is the principal ideal of R[x] generated by f. 
We let a denote the residue x of x in R’. Then because the map 77: R[x] > R[x]/(/) 
is ahomomorphism, 


(11.5.2) (f(x) = fe) = Gra” +----+Gg =0. 


Here dq; is the image in R’ of the constant polynomial a;. So, dropping bars, @ satisfies the 
relation f(a) = 0. The ring obtained in this way may be denoted by R[a] too. 

An example: Let a be an element of a ring R. An inverse of a is an element @ that 
satisfies the relation 


(11.5.3) aa-1=0. 


So we can adjoin an inverse by forming the quotient ring R’ = R[x]/(ax — 1). 


The most important case is that our element @ is a root of a monic polynomial: 
(11.5.4) f(x) =x" 4an yx"! 4+---+ajx+a9, witha; in R. 
We can describe the ring R[a] precisely in this case. 


Proposition 11.5.5 Let R be a ring, and let f(x) bea monic polynomial of positive degree n 
with coefficients in R. Let R[a] denote the ring R[x]/(f) obtained by adjoining an element 
satisfying the relation f(a) = 0. 

(a) The set (1,a@,..., a7!) isa basis of R[a] over R: every element of R{a] can be written 
uniquely as a linear combination of this basis, with coefficients in R. 

(b) Addition of two linear combinations is vector addition. 

(c) Multiplication of linear combinations is as follows: Let 8; and Bz be elements of R[a], 
and let g1(x) and g2(x) be polynomials such that 8; = 21(a@) and Bz = g2(@). One 
divides the product polynomial g1g2 by f, say 8182 = fg +r, where the remainder 
r(x), if not zero, has degree <n. Then Bj B2 = r(a). 


The next lemma should be clear. 


Lemma 11.5.6 Let f be a monic polynomial of degree n in a polynomial ring R[x]. Every 
nonzero element of ( f) has degree at least n. O 
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Proof of the proposition. (a) Since R[a] is a quotient of the polynomial ring R[x], every 
element f of R[a] is the residue of a polynomial g(x), i-e., B = g(a). Since f is monic, we 
can perform division with remainder: g(x) = f(x)q(x) + r(x), where r(x) is either zero or 
else has degree less than n (11.2.9). Then since f(a~) = 0, B = g(a) = r(@). In this way, f is 
written as a combination of the basis. The expression for f is unique because the principal 
ideal (f) contains no element of degree <n. This also proves (c), and (b) follows from the 
fact that addition in R[x] is vector addition. | 0 


Examples 11.5.7 (a) The kernel of the substitution map Z[x] —> C that sends x ~> y = /2 
is the principal ideal (x? — 2) of Z[x] (11.3.23). So Z[y] is isomorphic to Z[x]/ (x? — 2). The 
proposition shows that (1, y, y”) is a Z-basis for Z[y]. Its elements are linear combinations 
dy + ay + ary”, where a; are integers. If 8; = (y* — y) and B2 = Ce + 1), then 


Bibp=v-Pt+Y-vafOY-D+P%t+y-D=arty=2 
(b) Let R’ be obtained by adjoining an element 6 to Fs with the relation 67 — 3 = 0. Here 6 
becomes an abstract square root of 3. Proposition 11.5.5 tells us that the elements of R’ are 
the 25 linear expressions a + bé with coefficients a and b in Fs. 

We'll show that R’ is a field of order 25 by showing that every nonzero element a + bé 
of R’ is invertible. To see this, consider the product c = (a + bd) (a — bd) = (a* — 3b). This 
is an element of F5, and because 3 isn’t a square in Fs, it isn’t zero unless both a and b are 
zero. So if a+ b5+0, c is invertible in F;. Then the inverse of a + bd is (a — bd)c"!. 


(c) The procedure used in (b) doesn’t yield a field when it is applied to F,;. The reason is 
that F,, already contains two square roots of 3, namely +5. If R’ is the ring obtained by 
adjoining 5 with the relation 5* — 3 = 0, we are adjoining an abstract square root of 3, though 
I\; already contains two square roots. At first glance one might expect to get F,; back. We 
don’t, because we haven’t told 6 to be equal to 5 or -5. We’ve told 6 only that its square is 3. 
So 5 — 5 and 6 + 5 are not zero, but (6 + 5)(6 — 5) = 6? — 3 = 0. This cannot happen in a 
field. O 


It is harder to analyze the structure of the ring obtained by adjoining an element when 
the polynomial relation isn’t monic. 


e There is a point that we have suppressed in our discussion, and we consider it now: 
When we adjoin an element @ to a ring R with some relation f(~) = 0, will our original 
R be a subring of the ring R’ that we construct? We know that R is contained in the 
polynomial ring R[x], as the subring of constant polynomials, and we also have the canonical 
map 7: R[x] > R' = R[x]/(f). Restricting 2 to the constant polynomials gives us a 
homomorphism R -> R’, let’s call it y. Is y injective? If it isn’t injective, we cannot identify 
R with a subring of R’. 
The kernel of yw is the set of constant polynomials in the ideal: 


(11.5.8) ker w= RN(f). 


It is fairly likely that kery is zero because f will have positive degree. There will have to 
be a lot of cancellation to make a polynomial multiple of f have degree zero. The kernel 


Section 11.6 Product Rings 341 


is zero when q@ is required to Satisfy a monic polynomial relation. But it isn’t always zero. 
For instance, let R be the ring Z/(6) of congruence classes modulo 6, and let Jf benthe 
polynomial 2x + 1 in R[x]. Then 3 f = 3. The kernel of the map R > R/(f) is not zero. 


11.6 PRODUCT RINGS 


The product G x G’ of two groups was defined in Chapter 2. It is the product set, and the law 


of composition is componentwise: (x, x’)(y, y’) = (xy, x’y’). The analogous construction 
can be made with rings. 


Proposition 11.6.1 Let R and R’ be rings. 


(a) The product set RX R’ is a ring called the product ring, with component-wise addition 
and multiplication: 


Gero, ve @t+y.x+y) and G,x)G,y¥) = Gy, y), 


(b) The additive and multiplicative identities in RX R’ are (0, 0) and (1, 1), respectively. 

(c) The projections 7: Rx R’ > Rand zm’: RX R’ > R’ defined by m(x, x’) = x« and 
q' (x, x’) = x’ are ring homomorphisms. The kernels of and z’ are the ideals {0} x R’ 
and R X {0}, respectively, of R x R’. 

(d) The kernel R x {0} of 7’ is a ring, with multiplicative identity e = (1, 0). It is not a 
subring of RX R’ unless R’ is the zero ring. Similarly, {0} X R’ is a ring with identity 
e’ = (0, 1). It is not a subring of R X R’ unless R is the zero ring. 


The proofs of these assertions are very elementary. We omit them, but see the next 
proposition for part (d). C] 

To determine whether or not a given ring is isomorphic to a product ring, one looks 
for the elements that in a product ring would be (1, 0) and (0,1). They are idempotent 
elements. 


e An idempotent element e of a ring S is an element of S such that e =e. 


Proposition 11.6.2 Let e be an idempotent element of a ring S. 


(a) The element e’ = 1 — e is also idempotent, e + e’ = 1, and ee’ = 0. 

(b) With the laws of composition obtained by restriction from S, the principal ideal eS is 
a ring with identity element e, and multiplication by e defines a ring homomorphism 
S—> eS. 

(c) The ideal eS is not a subring of S unless e is the unit element 1 of Sande’ =0. 


(d) The ring S is isomorphic to the product ring eS x e’S. 


Proof. (a) e’? = (1-e)? =1-2e +e =e’, andee’ =e(1—e) =e—e=0. 


(b) Every ideal J of a ring S has the properties of a ring except for the existence of a 
multiplicative identity. In this case, e is an identity element for eS, because if ais in es, 
say a = es, then ea = e*s = es = a. The ring axioms show that multiplication by e is a 
homomorphism: e(a + b) = ea + eb, e(ab) = e-ab = (ea)(eb), andel =e. 
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(c) To be a subring of S, eS must contain the identity 1 of S. If it does, then e and 1 will both 
be identity elements of eS, and since the identity in a ring is unique, e = 1 and e’ = 0. 


(d) The rule g(x) = (ex, e’x) defines a homomorphism g: S —> eS X e’S, because both 
of the maps x ~» ex and x ~» e’x are homomorphisms and the laws of compostition in the 
product ring are componentwise. We verify that this homomorphism is bijective. First, if 
y(x) = (0, 0), then ex = 0 and e’x = 0. If so, then x = (e + e’)x = ex + e’x = 0 too. This 
shows that ¢ is injective. To show that ¢ is surjective, let (u, v) be an element of eS x e’S, 
say u = ex andv=e'y. Then g(u + v) = (e(ex+e’y), e'(ex+e’y)) = (u, v). So (u, v) is 
in the image, and therefore ¢g is surjective. _ O 


Examples 11.6.3 (a) We go back to the ring R’ obtained by adjoining an abstract square 
root of 3 to F};. Its elements are the 11? linear combinations a + b6, with a and b in F,; and 
5° = 3. We saw in (11.5.7)(c) that this ring is not a field, the reason being that F,; already 
contains two square roots +5 of 3. The elements e = 5 — 5 and e’ = -5 — 5 are idempotents 
in R’, and e + e’ = 1. Therefore R’ is isomorphic to the product eR’ X e’ R’. Since the order 
of R’ is 117, |eR’| = |e’ R’| = 11. The rings eR’ and e’R’ are both isomorphic to Fj,, and R’ 
is isomorphic to the product ring Fy; < Fy. 


(b) We define a homomorphism ¢:C[x, y] — C[x] x C[y] from the polynomial ring in two 
variables to the product ring by g( f(x, y)) = (f(x, 0), f(O, y)). Its kernel is the set of 
polynomials f(x, y) divisible both by y and by x, which is the principal ideal of C[x, y] 
generated by xy. The map isn’t quite surjective. Its image is the subring of the product 
consisting of pairs (p(x), g(y)) of polynomials with the same constant term. So the quotient 
C[x, y]/(xy) is isomorphic to that subring. O 


11.7. FRACTIONS 


In this section we consider the use of fractions in rings other than the integers. For instance, 
a fraction p/q of polynomials p and gq, with qg not zero, is called a rational function. 

Let’s review the arithmetic of integer fractions. In order to apply the statements below 
to other rings, we denote the ring of integers by the neutral symbol R. 


¢ A fraction is a symbol a/b, or ¢, where a and b are elements of R and b is not zero. 
e Elements of R are viewed as fractions by the rule a = a/1. 
e Two fractions a;/b; and a2/b2 are equivalent, a;/b; ~ az/bz, if the elements of R 
that are obtained by “‘cross multiplying” are equal, i.e., if a;b2 = aby. 
C.. Gage Be Ae AC 


e Sums and products of fractions are given by ; ec fo wae and ae aed 


We use the term “equivalent” in the third item because, strictly speaking, the fractions aren’t 
actually equal. 


A problem arises when one replaces the integers by an arbitrary ring R: In the 
definition of addition, the denominator of the sum is the product bd. Since denominators 
aren't allowed to be zero, bd had better not be zero. Since b and d are denominators, they 
aren't zero individually, but we need to know that the product of nonzero elements of R is 
nonzero. This turns out to be the only problem, but it isn’t always true. For example, in the 
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ring Z/(6) of congruence classes modulo 6, the classes 2 and 3 are not zero, but 2-3 = 0. Or, 
in a product RX R’ of nonzero rings, the idempotents (1, 0) and (0, 1) are nonzero elements 
whose product is zero. One cannot work with fractions in those rings. 


e An integral domain R, or just a domain for short, is a ring with this property: R is not the 
zero ring, and if a and b are elements of R whose product ab is zero, then a = 0 or b = 0. 


_ Any subring of a field is a domain, and if R is a domain, the polynomial ring R[x] is also a 
domain. 

An element a of a ring is called a zero divisor if it is nonzero, and if there is another 
nonzero element b such that ab = 0. An integral domain is a nonzero ring which contains 
no zero divisors. 

An integral domain R satisfies the cancellation law: 


(017.1) If ab =ac and a#0, then b=c. 


For, from ab = ac it follows that a(b — c) = 0. Then since a¥0 and since R is a domain, 
b-—c=0. O 


Theorem 11.7.2 Let F be the set of equivalence classes of fractions of elements of an 
integral domain R. 


(a) With the laws defined as above, F is a field, called the fraction field of R. 
(b) R embeds as a subring of F by the rulea~a/1. 


(c) Mapping Property: If R is embedded as a subring of another field F, the rulea/b = ab“! 
embeds F into F too. 


The phrase ‘“‘mapping property” is explained as follows: To write the property carefully, one 
should imagine that the embedding of R into F is given by an injective rng homomorphism 
gy: R — F. The assertion is then that the rule @(a/b) = y(a)y(b) | extends gy to an 
injective homomorphism ®: F > Ff. 

The proof of Theorem 11.7.2 has many parts. One must verify that what we call 
equivalence of fractions is indeed an equivalence relation, that addition and multiplication 
are well-defined on equivalence classes, that the axioms for a field hold, and that sending 
a~»a/1isan injective homomorphism R > F. Then one must check the mapping property. 
All of these verifications are straightfoward. 

If we were the first people who wished to use fractions in a ring, we’d be nervous and 
would want to go carefully through each of the verifications. But they have been made many 
times. It seems sufficient to check a few of them to get a sense of what is involved. 

Let us check that equivalence of fractions is a transitive relation. Suppose that 
a, /b, ~az/b2 and also that a2/b2~a3/b3 Then a,b? = azb, and azb3 = a3bz. We multiply 
by b3 and by: 

ayb2b3 = anb; by; and anb3b, = a3brb,. 


Therefore a1b2b3 = a3bzb,. Cancelling bz, a3b, = a,b3. Thus a;/ b; ~ a3/b3. Since we 
used the cancellation law, the fact that R is a domain is essential here. 

Next, we show that addition of fractions is well-defined. Suppose that a/b ~a'/b’ 
and c/d ~c’/d'. We must show that a/b + c/d~ a'/b' + c'/d', and to do that, we cross 
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multiply the expressions for the sums. We must show that u = (ad + bc)(b’d") is equal to 
v = (a'd' + b'c')(bd). The relations ab’ = a'b and cd’ = c'd show that 


u = adb'd' + bcb'd' = a'dbd' + bc'b'd = v. 


Verification of the mapping property is routine too. The only thing worth remarking is, 
that, if R is contained in F and if a/b is a fraction, then b40, so the rule a/b = ab”! makes 
sense. 

As mentioned above, a fraction of polynomials is called a rational function, and the 
fraction field of the polynomial ring K[x], where K is a field, is called the field of rational 
functions in x, with coefficients in K. This field is usually denoted by K(x): 


equivalence classes of fractions f/g, where f and g 
(Hey.) Key) = : 
are polynomials, and g is not the zero polynomial 

The rational functions we define here are equivalence classes of fractions of the formal 
polynomials that were defined in Section 11.2. If K = R, evaluation of a rational function 
f(x)/g(x) defines an actual function on the real line, wherever e(x)#0. But as with 
polynomials, we should distinguish the formally defined rational functions, which are 
fractions of formal polynomials, from the functions that they define. 


11.8 MAXIMAL IDEALS 


In this section we investigate the kernels of surjective homomorphisms 
(11.8.1) g:R->F 


from a ring R to a field F. 

Let g be such a map. The field F has just two ideals, the zero ideal (0) and the unit 
ideal (1) (11.3.19). The inverse image of the zero ideal is the kernel J of g, and the inverse 
image of the unit ideal is the unit ideal of R. The Correspondence Theorem tells us that the 
only ideals of R that contain J are J and R. Because of this, J is called a maximal ideal. 


« A maximal ideal M of a ring R is an ideal that isn’t equal to R, and that isn’t contained in 
any ideal other than M and R: If an ideal J contains M, then J = M or 1 = R. 


Proposition 11.8.2 

(a) Let g: R— R’ be a surjective ring homomorphism, with kernel J. The image R’ is a 
field if and only if J is a maximal ideal. 

(b) An ideal J of a ring R is maximal if and only if R = R/J isa field. 

(c) The zero ideal of a ring R is maximal if and only if R is a field. 


Proof. (a) A ring is a field if it contains precisely two ideals (11.3.19), so the Correspondence 
Theorem asserts that the image of ¢ is a field if and only if there are two precisely ideals that 


contain its kernel /. This will be true if and only if there are precisely two ideals. 


Parts (b) and (c) follow when (a) is applied to the canonical map R > R/TI. a) 
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Proposition 11.8.3 The maximal ideals of the ring Z of integers are the principal ideals 
generated by prime integers. 


Proof. Every ideal of Z is principal. Consider a principal ideal (nm), with n > 0. If n is a 
prime, sayn = p, then Z/(n) = Fp, a field. The ideal (7) is maximal. If n is not prime, there 
are three possibilities: n = 0, n = 1, or n factors. Neither the zero ideal nor the unit ideal 
is maximal. If factors, say n = ab, with 1 <a < n, then 1 ¢ (a), a ¢ (mn), andn é€ (a). 
Therefore (n) < (a) < (1). The ideal (n) is not maximal. O 


¢ A polynomial with coefficients in a field is called irreducible if it is not constant and if it is 
not the product of two polynomials, neither of which is a constant. 


Proposition 11.8.4 


(a) Let F be a field. The maximal ideals of F[x] are the principal ideals generated by the 
monic irreducible polynomials. 

(b) Let g: F[x] — R’ be a homomorphism to an integral domain R’, and let P be the kernel 
of g. Either P is a maximal ideal, or P = (0). 


The proof of part (a) is analogous to the proof just given. We omit the proof of (b). Oj 


Corollary 11.8.5 There is a bijective correspondence between maximal ideals of the 
polynomial ring C[x] in one variable and points in the complex plane. The maximal ideal 
Mz, that corresponds to a point a of C is the kernel of the substitution homomorphism 
Sq:C[x] — C that sends x ~=a. It is the principal ideal generated by the linear polynomial 
x—a. 


Proof. The kernel Mg of the substitution homomorphism s, consists of the polynomials 
that have a as a root, which are those divisible by x — a. So Mg = (x — a). Conversely, let 
M be a maximal ideal of C[x]. Then M is generated by a monic irreducible polynomial. The 
monic irreducible polynomials in C[x] are the polynomials x — a. O 


The next theorem extends this corollary to polynomials rings in several variables. 


Theorem 11.8.6 Hilbert’s Nullstellensatz.! The maximal ideals of the polynomial ring 


C[x1,..., Xn] are in bijective correspondence with points of complex n-dimensional space. 
A point a = (aj,...,@n) of C” corresponds to the kernel Mz, of the substitution map 
Sq:C[x1,...,Xn] > C that sends x; ~~» a;. The kernel Mz, is generated by the n linear 


polynomials x; — qj. 


Proof. Leta bea point of C”, and let M, be the kernel of sq. Since Sq is surjective and since 
C is a field, M, is a maximal ideal. To verify that Mg is generated by the linear polynomials 
as asserted, we first consider the case that the point a is the origin (0, ..., 0). We must show 
that the kernel of the map so that evaluates a polynomial at the origin is generated by the 
variables x1, ..., Xn. Well, f(0,...,0) = 0 if and only if the constant term of f is zero. If 
so, then every monomial that occurs in f is divisible by at least one of the variables, so f can 


1 The German word Nullstellensatz is a combination of three words whose translations are zero, places, theorem. 
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be written as a linear combination of the variables, with polynomial coefficients. The proof 
for an arbitrary point a can be made using the change of variable x; = x; + a; to move a to 
the origin. 

It is harder to prove that every maximal ideal has the form M,. Let M be a maximal 
ideal, and let F denote the field C[x,,..., xn]/M. We restrict the canonical map (11.4.1) 
m:C[x1,...,Xn] > F to the subring C[x;] of polynomials in the first variable, obtaining a 
homomorphism ¢ : C[x,] > F. Proposition 11.8.4 shows that the kernel of ¢ is either the 
zero ideal, or one of the maximal ideals (x; — a;) of C[x;]. We'll show that it cannot be the 
zero ideal. The same will be true when the index 1 is replaced by any other index, so M will 
contain linear polynomials of the form x; — a; for each i. This will show that M contains one 
of the ideals Ma, and since M, is maximal, M will be equal to that ideal. 

In what follows, we drop the subscript from x,. We suppose that ker g = (0). Then 
y maps C[x] isomorphically to its image, a subring of ¥. The mapping property of fraction 
fields shows that this map extends to an injective map C(x) > F, where C(x) is the field of 
rational functions — the field of fractions of the polynomial ring C[x]. So F contains a field 
isomorphic to C(x). The next lemma shows that this is impossible. Therefore kerp#(0). 


Lemma 11.8.7 


(a) Let R be a ring that contains the complex numbers C as a subring. The laws of 
composition on R can be used to make R into a complex vector space. 

(b) As a vector space, the field F = C[x,,...,x,]/M is spanned by a countable set of 
elements. 


(c) Let V be a vector space over a field, and suppose that V is spanned by a countable set 
of vectors. Then every independent subset of V is finite or countably infinite. 


(d) When C(x) is made into a vector space over C, the uncountable set of rational functions 
(x — a)7!, with @ in C, is independent. 


Assume that the lemma has been proved. Then (b) and (c) show that every independent set 
in F is finite or countably infinite. On the other hand, ¥ contains a subring isomorphic to 
C(x), so by (d), F contains an uncountable independent set. This is a contradiction. O 


Proof of the Lemma. (a) For addition, one uses the addition law in R. Scalar multiplication 
ca of an element a of R by an element c of C is defined by multiplying these elements in R. 
The axioms for a vector space follow from the ring axioms. 


(b) The surjective homomorphism 7: C[x1, ..., xn] — F defines a map C > Ff, by means 
of which we identify C as a —* se i. a make ¥ into a complex vector space. The 
countable set of monomials a ” forms a basis for C[x,,...,x,], and since 71 is 
surjective, the images of these ane span F. 


(c) Let S be a countable set that spans V, say S = {v1, v2, ...}. It could be finite or infinite. 
Let S, be the subset (v),..., v,) consisting of the first n elements of S, and let V,, be the 
span of Sp. If S is infinite, there will be infinitely many of these subspaces. Since S spans V, 
every element of V is a linear combination of finitely many elements of S, so it is in one of 
the spaces V,. In other words, J V,, = V. 
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. Let L be an independent set in V, and let Ln = LO Vg oT hen, iseaglinganly 
independent subset of the space V,,, which is spanned by a set of n elements. So |Ly| <n 


(3.4.18). Moreover, L = |) Ly because V = |) V,. The union of countably many finite sets 
is finite or countably infinite. 


(d) We must remember that linear combinations can involve only finitely many vectors. So 
we ask: Can we have a linear relation 


é G 
v 
~~ xX = 0, 
v=] ial) 
where aj, ..., @% are distinct complex numbers and the coefficients Cy aren’t zero? No. Such 


a linear combination of formal rational functions defines a complex valued function except 
at the points x = a. If the linear combination were zero, the function it defines would be 
identically zero. But (x — a,)7! takes on arbitrarily large values near a, while (x —a,)7! 
is bounded near a, for v = 2,...,k. So the linear combination does not define the zero 
function. O 


11.9 ALGEBRAIC GEOMETRY 


A point (a1,...,@,) of C” is called a zero of a polynomial f(x1,..., xn) of n variables 
if f(aj,.-.,@n) = 0. We also say that the polynomial f vanishes at such a point. The 
common zeros of aset { fi, ..., f-} of polynomials are the points of C” at which all of them 
vanish — the solutions of the system of equations fj =--- = f; =0. 


e A subset V of complex m-space C” that is the set of common zeros of a finite number of 
polynomials in n variables is called an algebraic variety, or just a variety. 


For instance, a complex line in the (x, y)-plane C? is, by definition, the set of solutions 
of a linear equation ax + by + c = 0. This is a variety. So is a point. The point (a, b) of C? 
is the set of common zeros of the two polynomials x — a and y — b. The group SL2(C) isa 
variety in C22. It is the set of zeros of the polynomial x11x22 — x12x2; — 1. 


The Nullstellensatz provides an important link between algebra and geometry. It tells 
us that the maximal ideals in the polynomial ring C[x;,..., Xn] correspond to points in 
C”. This correspondence also relates algebraic varieties to quotient rings of the polynomial 


ring. 


Theorem 11.9.1 Let J be the ideal of C[x1,..., Xn] generated by some polynomials 
fi, ... fr, and let R be the quotient ring C[x,, ..., Xn]//. Let V be the variety of (common) 
zeros of the polynomials f,,..., f- in C”. The maximal ideals of K are in bijective 


correspondence with the points of V. 


Proof, The maximal ideals of R correspond to the maximal ideals of Calera ecto 
contain J (Correspondence Theorem). An ideal of C[x1, ..., Xn] will contain / if and only 
if it contains the generators f,,..., f, of J. Every maximal ideal of the ring C[x,,..., x] 
is the kernel M, of the substitution map that sends x; ~» a; for some POMC = Kaige. Gay) 
of C”, and the polynomials f;,..., fr are in Mg if and only if fi(a4) = --- = fr(a) = 9, 
which is to say, if and only if a is a point of V. O 
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As this theorem suggests, algebraic properties of the ring R = C[x]/J are closely related 
to geometric properties of the variety V. The analysis of this relationship is the field of 
mathematics called algebraic geometry. 


A simple question one might ask about a set is whether or not it is empty. Is it possible 
for a ring to have no maximal ideals at all? This happens only for the zero ring. 


Theorem 11.9.2 Let R be a ring. Every ideal J of R that is not R itself is contained in a 
maximal ideal. 


To find a maximal ideal, one might try this procedure: If 7 is not maximal, choose a proper 
ideal J’ that is larger than J. Replace 7 by I’, and repeat. The proof follows this line of 
reasoning, but one may have to repeat the procedure many times, possibly uncountably 
often. Because of this, the proof requires the Axiom of Choice, or Zorn’s Lemma (see the 
Appendix). The Hilbert Basis Theorem, which we will prove later (14.6.7), shows that for 
most rings that we study, the proof requires only a weak countable version of the Axiom of 
Choice. Rather than enter into a discussion of the Axiom of Choice here, we defer further 
discussion of the proof to Chapter 14. . O 


Corollary 11.9.3. The only ring R having no maximal ideals is the zero ring. 


This follows from the theorem, because every nonzero ring R contains an ideal different 
from R: the zero ideal. O 


Putting Theorems 11.9.1 and 11.9.2 together gives us another corollary: 


Corollary 11.9.4 Ifa system of polynomial equations fj; =---= f; = Oinn variables has 
no solution in C”, then 1 is a linear combination 1 = )° g; f; with polynomial coefficients g;. 


Proof. If the system has no solution, there is no maximal ideal that contains the ideal 
l==(f;, ., FP. Sond is the unitadeals and 1 isin ee 


Example 11.9.5 Most choices of three polynomials f;, fo, f3 in two variables have no 
common solutions. For instance, the ideal of C[t, x] generated by 


(11.9.6) fiH=P4+xr-2, fp=m-1, fp=P4+5tex74+1 


is the unit ideal. This can be proved by showing that the equations f; = fo = f3 = 0 have 
no solution in C?. 


It isn’t easy to get a clear geometric picture of an algebraic variety in C”, but the 
general shape of a variety in C? canbe described fairly simply, and we do that here. We 
work with the polynomial ring in the two variables ¢ and x. 


Lemma 11.9.7 Let f(t, x) be a polynomial, and let ~ be a complex number. The following 
are equivalent: 
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(a) f(t, x) vanishes at every point of the locus {t = a} in C?, 
(b) The one-variable polynomial f(a, x) is the zero polynomial, 
(c) t—a divides f in C[t, x]. 


Proof. If f vanishes at every point of the locus t = a, the polynomial f(a, x) is zero for 
every x. Then since a nonzero polynomial in one variable has finitely many roots, f(a, x) is 
the zero polynomial. This shows that (a) implies (b). 

A change of variable t = ft’ + aw reduces the proof that (b) implies (c) to the case that 
a = 0. If (0, x) is the zero polynomial, then t divides every monomial that occurs in f, and 
t divides f. Finally, the implication (c) implies (a) is clear. 


Let F denote the field of rational functions C(¢) in ¢, the field of fractions of the ring 
C[t]. The ring C[t, x] is a subring of the one-variable polynomial ring F[x]; its elements are 
polynomials in x, 


(11.9.8) S(t, X) = an(Qx" +--+ +a, (x + ag(t), 


whose coefficients a;(t) are rational functions in 7. It can be helpful to begin by studying 
a problem about C[t, x] in the ring [x], because its algebra is simpler. Division with 
remainder is available, and every ideal of F [x] is principal. 


Proposition 11.9.9 Let A(t, x) and f(t, x) be nonzero elements of C[t, x]. Suppose that h 
is not divisible by any polynomial of the form t — a. If h divides f in F[x], then h divides f 
inet, x]. 


Proof. We divide by h in F[x], say f = hq, and we show that q is an element of C[z, x]. 
Since g is an element of F [x], it is a polynomial in x whose coefficients are rational functions 
in t. We multiply both sides of the equation f = hq by a monic polynomial in ¢ to clear 
denominators in these coefficients. This gives us an equation of the form 


u(t) f(t, x) =A, qi, x), 


where u(t) is a monic polynomial in f, and q is an element of C[t, x]. We use induction on 
the degree of u. If u has positive degree, it will have a complex root a. Then t — a divides 
the left side of this equation, so it divides the right side too. This means that h(a@, x)qi(q@, x) 
is the zero polynomial in x. By hypothesis, t — a does not divide h, so h(a, x) is not zero. 
Since the polynomial ring C[x] is a domain, q|(@, x) = 0, and the lemma shows that ¢t — @ 
divides g;(t, x). We cancel t — a from u and q). Induction completes the proof. ‘Gi 


Theorem 11.9.10 Two nonzero polynomials f(t,x) and g(f,x) in two variables have 
only finitely many common zeros in C2, unless they have a common nonconstant factor 


in C[t, x]. 


If the degrees of the polynomials f and g are m and n respectively, the number 
of common zeros is at most mn. This is known as the Bézout bound. For instance, two 
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quadratic polynomials have at most four common zeros. (The analogue of this statement for 
real polynomials is that two conics intersect in at most four points.) It is harder to prove the 
Bézout bound than the finiteness. We won’t need that bound, so we won’t prove it. 


Proof of Theorem 11.9.10. Assume that f and g have no common factor. Let J denote the 
ideal generated by f and g in F[x], where F = C(t), as above. This is a principal ideal, 
generated by the (monic) greatest common divisor h of f and g in F [x]. 

If h¥1, it will be a polynomial whose coefficients may have denominators that are 
polynomials in t. We multiply by a polynomial in ¢ to clear these denominators, obtaining 
a polynomial f; in C[t, x]. We may assume that / isn’t divisible by any polynomial t — a. 
Since the denominators are units in F and since h divides f and g in F[x], hy also divides 
f and g in F [x]. Proposition 11.9.9 shows that h, divides f and g in C[t, x]. Then f and g 
have a common nonconstant factor in C[t, x]. We’re assuming that this is not the case. 

So the greatest common divisor of f and g in F[x] is 1, and 1 =r f+ sg, where r and 
s are elements of F[x]. We clear denominators from r and s, multiplying both sides of the 
equation by a suitable polynomial u(t). This gives us an equation of the form 


u(t) =ry(t, x) f(t, x) + 51 x)gtt, x), 


where all terms on the right are polynomials in C[t, x]. This equation shows that if (fo, xo) 
is acommon zero of f and g, then fo must be a root of u. But u is a polynomial in ¢, and 
a nonzero polynomial in one variable has finitely many roots. So at the common zeros of 
f and g, the variable ¢ takes on only finitely many values. Similar reasoning shows that 
x takes on only finitely many values. This gives us only finitely many possibilities for the 
common zeros. O 


Theorem 11.9.10 suggests that the most interesting varieties in C* are those defined as 
the locus of zeros of a single polynomial f(t, x). 


¢ The locus X of zeros in C? of a polynomial f(t, x) is called the Riemann surface of f. 


It is also called a plane algebraic curve — a confusing phrase. As a topological space, the 
locus X has dimension two. Calling it an algebraic curve refers to the fact that the points 
of X depend only on one complex parameter. We give a rough description of a Riemann 
surface here. Let’s assume that the polynomial f is irreducible — that it is not a product of 
two nonconstant polynomials. and also that it has positive degree in the variable x. Let 


(11.9.11) X = {(t,x) € C’ | f(t, x) =0} 


be its Riemann surface, and let T denote the complex t-plane. Sending (tf, x) ~ t defines a 
continuous map that we call a projection 


(11.9.12) ee 


We will describe X in terms of this projection. However, our description will require that a 
finite set of “‘bad points” be removed from X. In fact, what is usually called the Riemann 
surface agrees with our definition only when suitable finite subsets are removed. The locus 
{ f = 0} may be “‘singular’’ at some points, and some other points of X may be “at infinity.’ 
The points at infinity are explained below (see (11.9.17)). 


Section 11.9 Algebraic Geometry 351 


The simplest —, of singular points are nodes, at which the surface crosses itself, 
and cusps. The locus x° = 1° — t* has a node at the origin, and the locus x2 = / has a cusp 
at the origin. The real points of these Riemann surfaces are shown here. 


a node a cusp 
(11.9.13) Some Singular Curves 


To avoid repetition of the disclaimer ‘“‘except on a finite set,” we write X’ for the 
complement of an unspecified finite subset of X, which is allowed to vary. Whenever 
a construction runs into trouble at some point, we simply delete that point. Essentially 
everything we do here and when we come back to Riemann surfaces in Chapter 15 will be 
valid only for X’. We keep X on hand for reference. 

Our description of the Riemann surface will be as a branched covering of the comple 
t-plane 7. The definition of covering space that we give here assumes that the spaces are Haus 
dorff spaces ([Munkres] p. 98). You can ignore this point if you don’t know what it meati». 
The sets in which we are interested are Hausdorff spaces because they are subsets of C2. 


Definition 11.9.14 Let X and T be Hausdorff spaces. A continuous map 7: X — T is an 
n-sheeted covering space if every fibre consists of 7 points, and if it has this property: Let 
xo be a point of X and let (xo) = to. Then 7 maps an open neigborhood U of xo in X 
homeomorphically to an open neighborhood V of fo in T. 


A map z from X to the complex plane 7 is an n-sheeted branched covering if X contains 
no isolated points, the fibres of z are finite, and if there is a finite set A of points of T called 
branch points, such that the map (X — mw! A) — (T — A) is ann-sheeted covering space. 
For emphasis, a covering space is sometimes called an unbranched covering. 


Figure 11.9.15 below depicts the Riemann surface of the polynomial x? — t, a two- 
sheeted covering of T that is branched at the point ¢ = 0. The figure has been obtained by 
writing ¢ and x in terms of their real and imaginary parts, ! = fg + au and x = x9 + X1i, 
and dropping the imaginary part x; of x, to obtain a surface in three-dimensional space. Its 
further projection to the plane is depicted using standard graphics. . 

The projected surface intersects itself along the negative t-axis, though the Riemann 
surface itself does not. Every negative real number ¢ has two purely imaginary square roots. 
The real parts of these square roots are zero, and this produces the self-crossing in the 


projected surface. 
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(11.9.14) Part of an unbranched covering. 


LAA, 4 


“Uf 


Upiiis 
TITTY 


(41.9.5) The Riemann surface x” = t. 
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Given a branched covering X — T, we refer to the points in the set A as its branch 
points, though this is imprecise: The defining property continues to hold when we add any 
finite set of points to A. So we allow the possibility that some points of A don’t need to be 
included — that they aren’t “true” branch points. 


Theorem 11.9.16 Let f(t, x) be an irreducible polynomial in C[t, x] which has positive 
degree n in the variable x. The Riemann surface of f is an n-sheeted branched covering of 
the complex plane 7. 


Proof. The main step is to verify the first condition of (11.9.14), that the fibre 2! (to) consists 
of precisely n points except on a finite subset A. 

The points of the fibre 27 1(¢9) are the points (tg, x9) such that xo is a root of the 
one-variable polynomial f(to, x). We must show that, except for a finite set of values t = 0, 
this polynomial has n distinct roots. We write f(t, x) as a polynoinial in x whose coefficients 
are polynomials in 7, say f(x) = an(t)x" + ---+ ag(t), and we denote a;(to) by ae The 
polynomial f(to, x) = a®x"4---4 a°x + a} has degree at most n, so it has at most n roots. 
Therefore the fibre 2 !(p) contains at most n points. It will have fewer than n points if 
either 


(11.9.17) 


(a) the degree of f(to, x) is less than n, or 
(b) f(to, x) has a multiple root. 


The first case occurs when fo is a root of a, (ft). (If to is a root of a, (ft), one of the roots 
of f(t,, x) tends to infinity as t; — to.) Since a, (£) is a polynomial, there are finitely many 
such values. 

Consider the second case. A complex number xo is a multiple root of a polynomial 
h(x) if (x — x9)? divides h(x), and this happens if and only if xo is a common root of h(x) 
and its derivative h'(x) (see Exercise 3.5). Here h(x) = f(to, x). The first variable is fixed, 
so the derivative is the partial derivative = Going back to the polynomial f(t, x) in two 
variables, we see that the second case occurs at the points (fo, xo) that are common zeros of 
f and = Now f cannot divide its partial derivative, which has lower degree in x. Since f is 


assumed to be irreducible, f and wi have no common nonconstant factor. Theorem 11.9.10 
tells us that there are finitely many common zeros. 

We now check the second condition of (11.9.14). Let f9 be a point of T such that the 
fibre 7° !(to) consists of n points, and let (fo, xo) be a point of X in the fibre. Then xo is 
a simple root of f(to, x), and therefore e is not zero at this point. The Implicit Function 
Theorem A.4.3 implies that one can solve for x as a function x(t) of t in a neighborhood of 
fo, such that x(to) = Xo. The neighborhood U referred to in the definition of covering space 
is the graph of this function. - O 


To me algebraic geometry is algebra with a kick. 


—Solomon Lefschetz 


354 Chapter 11 Rings 


EXERCISES 


Section 1 Definition of a Ring 


1.1. Prove that 7 + </2 and /3 + V-5 are algebraic numbers. 

1.2. Prove that, for m #0, cos(27/n) is an algebraic number. 

1.3. Let Q[a, A] denote the smallest subring of C containing the rational numbers Q and the 
elements a = /2 and B = V3. Lety=a+ B. Is Qla, 8] = Q[y]? Is Z[a, B] = Z[y]? 

1.4. Leta = ij . Prove that the elements of Z[@] are dense in the complex plane. 

1.5. Determine all subrings of R that are discrete sets. 

1.6. Decide whether or not S is a subring of R, when 


(a) S is the set of all rational numbers a/b, where b is not divisible by 3, and R = Q, 


(b) Sis the set of functions which are linear combinations with integer coefficients of the 
functions {1, cosnt, sinnt},n € Z, and R is the set of all real valued functions of t. 


1.7. Decide whether the given structure forms a ring. If it is not a ring, determine which of the 
ring axioms hold and which fail: 


(a) U is an arbitrary set, and R is the set of subsets of U. Addition and multiplication of 
elements of R are defined by the rules A+ B = (AUB)-—(ANB)andA-B=ANB. 


(b) R is the set of continuous functions R > R. Addition and multiplication are defined 
by the rules [f + g](x) = f(x) + g(x) and [fo g](x) = f(g(x)). 
1.8. Determine the units in: (a) Z/12Z, (b) Z/8Z, (ce) Z/nZ. 


1.9. Let R be a set with two laws of composition satisfying all ring axioms except the 
commutative law for addition. Use the distributive law to prove that the commutative law 
for addition holds, so that R is a ring. 


Section 2 Polynomial Rings 


2.1. For which positive integers n does x? + x + 1 divide x4 + 3x3 + x* + 7x +5 in[Z/(n)][x]? 


2.2. Let F be a field. The set of all formal power series p(t) = ag + ajt + ant* +---, with a; 
in F’, forms a ring that is often denoted by F|[¢]]. By formal power series we mean that 
the coefficients form an arbitrary sequence of elements of F. There is no requirement of 
convergence. Prove that F|[7]] is a ring, and determine the units in this ring. 


Section 3» Homomorphisms and Ideals 


3.1. Prove that an ideal of a ring R is a subgroup of the additive group R*. 
3.2. Prove that every nonzero ideal in the ring of Gauss integers contains a nonzero integer. 
3.3. Find generators for the kernels of the following maps: 

(a) R[x, y] > R defined by f(x, y) ~ f(0, 0), 

(b) R[x] > C defined by f(x) ~ f(2 +i), 

(c) Z[x] > R defined by f(x) » f(1 + V2), 


3.4. 


35: 


3.6. 


Oot. 
3.8. 


awe 


3.10. 


3-11, 


3.12. 


3.13. 
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(d) Z[x] — C defined by x ~~» /2 + /3, 
(e) C[x, y, z] > C[t] defined by x~ 4, y+ 22,24 8. 


Let p:C[x, y] > C[t] be the homomorphism that sends x ~» t-+1 and y~ 0-1. Determine 


the kernel K of g, and prove that every ideal J of Cx, y] that contains K can be generated 
by two elements. 


The derivative of a polynomial f with coefficients in a field F is defined by the calculus 
formula (ayx” + --- + a)x + ao)’ = nayx"-! +... + 1a;. The integer coefficients are 
interpreted in F using the unique homomorphism Z > F. 


(a) Prove the product rule ( fg)’ = f’g + fg’ and the chain rule (f og)’ = (f’og)g’. 


(b) Let a be an element of F. Prove that @ is a multiple root of a polynomial f if and only 
if it is acommon root of f and of its derivative f’. 


An automorphism of a ring R is an isomorphism from R to itself. Let R be a ring, 
and let f(y) be a polynomial in one variable with coefficients in R. Prove that the map 
R[x, y] > R[x, y] defined by x ~~» x + f(y), y~» yis an automorphism of R[x, y]. 


Determine the automorphisms of the polynomial ring Z[x] (see Exercise 3.6). 


Let R be a ring of prime characteristic p. Prove that the map R > R defined by x ~ x? is 
a ring homomorphism. (It is called the Frobenius map.) 


(a) An element x of a ring R is called nilpotent if some power is zero. Prove that if x is 
nilpotent, then 1 + x is a unit. 

(b) Suppose that R has prime characteristic p #0. Prove that if a is nilpotent then 1 + a is 
unipotent, that is, some power of 1 + a is equal to 1. 


Determine all ideals of the ring F[[t]] of formal power series with coefficients in a field F 
(see Exercise 2.2). 


Let R be a ring, and let J be an ideal of the polynomial ring R[x]. Let n be the lowest 
degree among nonzero elements of J. Prove or disprove: / contains a monic polynomial of 
degree n if and only if it is a principal ideal. 


Let J and J be ideals of a ring R. Prove that the set / + J of elements of the form x + y, 
with x in J and yin J, is an ideal. This ideal is called the sum of the ideals J and J. 


Let J and J be ideals of a ring R. Prove that the intersection /M J is an ideal. Show by 
example that the set of products {xy | x € I, y € J} need not be an ideal, but that the set 
of finite sums >> x,y, of products of elements of J and J is an ideal. This ideal is called 
the product ideal, and is denoted by /J. Is there a relation between JJ and 1 J? 


Section 4 Quotient Rings 


4.1. 


4.2. 
4.3. 


4.4. 


Consider the homomorphism Z[x] > Z that sends x ~» 1. Explain what the Correspon- 
dence Theorem, when applied to this map, says about ideals of Z[x]. 


What does the Correspondence Theorem tell us about ideals of Z{x] that contain x? + 1? 


Identify the following rings: (a) Z[x]/(x* — 3, 2x +4), (b) Zi)/ (2+i), 
(c) Z[x]/(6, 2x —1), (d) Z[x]/(2x? — 4, 4x — 5), (©) Z[x]/(@* + 3, 5). 


Are the rings Z[x]/(x? + 7) and Z[x]/ (2x2 +7) isomorphic? 
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Section 5 Adjoining Elements 


ca Be 


5.2. 


5.3. 


5.4. 


S05. 
5.6. 


BELG 


Let f =x44+x34x* +x +1 and let a denote the residue of x in the ring R = Z[x]/(f). 
Express (a3 + a? + a) (a> + 1) in terms of the basis (1, «, a”, a?) of R. 

Let a be an element of a ring R. If we adjoin an element @ with the relation a = a, we 
expect to get a ring isomorphic to R. Prove that this is true. 

Describe the ring obtained from Z/12Z by adjoining an inverse of 2. 

Determine the structure of the ring R’ obtained from Z by adjoining an element a satisfying 
each set of relations. 

(a) 2a = 6,6 = 15, (b) 2a-6=0,a-10=0, (Jai +a*+1=0,07 +a=0. 

Are there fields F such that the rings F[x]/(x2) and F[x]/(x* — 1) are isomorphic? 


Let a be an element of a ring R, and let R’ be the ring R[x]/(ax — 1) obtained by adjoining 
an inverse of a to R. Let a denote the residue of x (the inverse of a in R’). 


(a) Show that every element of R’ can be written in the form 6 = ab, with D in R. 

(b) Prove that the kernel of the map R > R’ is the set of elements b of R such that 
a" b=Ofensomean = 0: 

(c) Prove that R’ is the zero ring if and only if a is nilpotent (see Exercise 3.9). 


Let F be a field and let R = F[t] be the polynomial ring. Let R’ be the ring extension 
R{x]/(tx — 1) obtained by adjoining an inverse of ¢ to R. Prove that this ring can be 
identified as the ring of Laurent polynomials, which are finite linear combinations of 
powers of t, negative exponents included. 


Section 6 Product Rings 


6.1. 


6.2. 


6.3. 
6.4. 


6.5. 


6.6. 
6.7. 


6.8. 


Let g: R[x] ~ CXC be the homomorphism defined by g(x) = (1, 1) and g(r) = (r,r) 
for r in R. Determine the kernel and the image of ¢. 

Is Z/(6) isomorphic to the product ring Z/(2) x Z/(3)? Is Z/(8) isomorphic to Z/(2) X 
Z/(4)? 

Classify rings of order 10. 

In each case, describe the ring obtained from the field F2 by adjoining an element a 
satisfying the given relation: 

(a)o*+a+1=0, (b)a?4+1=0, ()a*+a=0. 


Suppose we adjoin an element q@ satisfying the relation a* = 1 to the real numbers R. 
Prove that the resulting ring is isomorphic to the product R x R. 


Describe the ring obtained from the product ring R X R by inverting the element (2, 0). 
Prove that in the ring Z[x], the intersection (2) M (x) of the principal ideals (2) and (x) 


is the principal ideal (2x), and that the quotient ring R = Z[x]/(2x) is isomorphic to the 
subring of the product ring F [x] x Z of pairs (f(x), ) such that f(0)=n modulo 2. 


Let J and J be ideals of a ring R such that 7+ J = R. 


(a) Prove that JJ = IN J (see Exercise 3.13). 


(b) Prove the Chinese Remainder Theorem: For any pair a, b of elements of R, there is an 
element x such that x=a modulo J and x=b modulo J. (The notation x =a modulo 
I means x —aé I.) 
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(c) Prove that if JJ = 0, then R is isomorphic to the product ring (R/D X(R/J). 
(d) Describe the idempotents corresponding to the product decomposition in (ce). 


Section 7 Fractions 


Ths 
Vans 


TS. 
7.4. 


TS 


Prove that a domain of finite order is a field. 

Let R be a domain. Prove that the polynomial ring R[x] is a domain, and identify the units 
in R[x}. 

Is there a domain that contains exactly 15 elements? 

Prove that the field of fractions of the formal power series ring F[[x]] over a field F can be 


obtained by inverting the element x. Find a neat description of the elements of that field 
(see Exercise 11-271). 


A subset S of a domain R that is closed under multiplication and that does not contain 0 is 
called a multiplicative set. Given a multiplicative set S, define S-fractions to be elements of 
the form a/b, where b is in S. Show that the equivalence classes of S-fractions form a ring. 


Section 8 Maximal Ideals 


8.1. 
8.2. 


8s. 
8.4. 


Which principal ideals in Z[x] are maximal ideals? 

Determine the maximal ideals of each of the following rings: 

(a) RXR, (b) R[x]/(@), (©) R[x]/@? — 3x +2), (d) R[x]/? +x+1). 

Prove that the ring F2[x]/(x? + x + 1) isa field, but that F3[x]/(x> + x + 1) is nota field. 


Establish a bijective correspondence between maximal ideals of R[x] and points in the 
upper half plane. 


Section9 Algebraic Geometry 


oA. 


9.2. 


933: 
9.4. 
a 


9.6. 
ELE 


Let / be the principal ideal of C[x, y] generated by the polynomial y? +x? —17. Which of the 
following sets Oe maximal ideals in the quotient ring R = Clix, y\/1? w~-1,y—-4), 
(ee lege ee x — ey) 

Let fi,..., f; be complex polynomials in the variables x;,..., Xn, let V be the variety 
of their common zeros, and let J be the ideal of the polynomial ring R = C[xj,..., Xn] 
that they generate. Define a homomorphism from the quotient ring R = R/J to the ring 
R of continuous, complex-valued functions on V. 

Let U = {fi(%1,.--, Xm) = 0), V = {8j0(1,---, Yn) = 0} be varieties in C” and c", 
respectively. Show that the variety defined by the equations { fj(x) = 0, gj(y) = 0} in 
x, y-space C”*” is the product set UX V. 

Let U and V be varieties in C”. Prove that the union U U V and the intersection UN V 
are varieties. What does the statement UM V = @ mean algebraically? What about the 
statement UU V = C"? 

Prove that the variety of zeros of a set {f1,..., f,} of polynomials depends only on the 
ideal that they generate. 

Prove that every variety in C? is the union of finitely many points and algebraic curves. 


Determine the ati of intersection in C? of the two loci in each of the following cases: 
(a) 2 —x3 4x2 =1, x+ y=1, (b) x? se = 1, Baas 
(c) Y = x7, xy=1, (d) x+y =0, ae + 2xy* + yt =0. 
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99; 


9.10. 


9.11. 


9.12. 


*9°13. 
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Which ideals in the polynomial ring C[x, y] contain x? + y? — 5 and xy — 2? 


An irreducible plane algebraic curve C is the locus of zeros in C* of an irreducible 
polynomial f(x, y). A point p of C is a singular point of the curve if f = 0 f/dx = 
a f/ay = 0 at p. Otherwise p is a nonsingular point. Prove that an irreducible curve has 
only finitely many singular points. 


Let L be the (complex) line {ax + by +c = 0} in C2, and let C be the algebraic curve 
{ f(x, y) = 0}, where f is an irreducible polynomial of degree d. Prove CN L contains at 
most d points unless C = L. 


Let C, and C> be the zeros of quadratic polynomials f; and f2 respectively that don’t 
have a common linear factor. 

(a) Let p and q be distinct points of intersection of C, and C2, and let L be the (complex) 
line through p and q. Prove that there are constants c; and c2, not both zero, so that 
g = c1f; +2 f2 vanishes identically on L. Prove also that g is the product of linear 
polynomials. 

Hint: Force g to vanish at a third point of L. 

(b) Prove that C, and C2 have at most 4 points in common. 


Prove in two ways that the three polynomials f; = t?+x?—2, fo =tx-1, fg =P+5tx?4+1 
generate the unit ideal in C[t, x]: by showing that they have no common zeros, and also by 
writing 1 as a linear combination of f;, f2, f3, with polynomial coefficients. 


Let g : C[x, y] — C[t] be a homomorphism that is the identity on C and sends x ~» x(¢), 
y~ y(t), and such that x(t) and y(£) are not both constant. Prove that the kernel of gis a 
principal ideal. 


Miscellaneous Exercises 


M.1. 
M.2. 


M.3. 


M.4. 


M.5. 


M.6. 
*M.7. 


Prove or disprove: If a* = a for every a in a nonzero ring R, then R has characteristic 2. 


A semigroup S is a set with an associative law of composition having an identity element. 
Let S be a commutative semigroup that satisfies the cancellation law: ab = ac implies 
b = c. Prove that S can be embedded into a group. 


Let R denote the set of sequences a = (a), a2, a3, .. .) of real numbers that are eventually 

constant: @, = Qy4, = ... for sufficiently large n. Addition and multiplication are 

componentwise, that is, addition is vector addition and multiplication is defined by 
b = (a,b1, azb2, .. .). Prove that R is a ring, and determine its maximal ideals. 

(a) Classify rings R that contain C and have dimension 2 as vector space over C. 

(b) Do the same for rings that have dimension 3. 


Define g:C[x, y] > C[x]xC[y] xC[z] by f(x, y) ~ (f(x, 0), f(0, y), f(t, 2). Determine 
the image of this map, and find generators for the kernel. 


Prove that the locus y = sin x in R* doesn’t lie on any algebraic curve in C?. 
Let X denote the closed unit interval [0, 1], and let R be the ring of continuous functions 
XR. 


(a) Let fi, ..., f, be functions with no common zero on X. Prove that the ideal generated 
by these functions is the unit ideal. 


Hint: Consider f? + ---+ 2. 


(b) Establish a bijective Rec as between maximal ideals os R and points on the 
interval. 
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Factoring 


You probably think that one knows everything about polynomials. 


—Serge Lang 


12.1 FACTORING INTEGERS 


We study division in rings in this chapter, modeling our investigation on properties of the 
ring of integers, and we begin by reviewing those properties. Some have been used without 
comment in earlier chapters of the book, and some have been proved before. 


A property from which many others follow is division with remainder: If a and b are 
integers and a is positive, there exist integers g and r so that 


(12.1.1) b=aq+r, and 0<r<a. 
We’ve seen some of its important consequences: 


Theorem 12.1.2 
(a) Every ideal of the ring Z of integers is principal. 


(b) A pair a, b of integers, not both zero, has a greatest common divisor, a positive integer 
d with these properties: 


(i) Zd = Za + Zb, 

(ii) d divides a and d divides b, 
(iii) if an integer e divides a and b, then e divides d. 
(iv) There are integers r and s such that d= ra-+ sb. 


(c) Ifa prime integer p divides a product ab of integers, then p divides a or p divides b. 


(d) Fundamental Theorem of Arithmetic: Every positive integer a#1 can be written as 
a product a = p;--: Pz, where the p; are positive prime integers, and k > 0. This 
expression is unique except for the ordering of the prime factors. 


The proofs of these facts will be reviewed in a more general setting in the next section. 
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12.2. UNIQUE FACTORIZATION DOMAINS 


It is natural to ask which rings have properties analogous to those of the ring of integers, 
and we investigate this question here. There are relatively few rings for which all parts of 
Theorem 12.1.2 can be extended, but polynomial rings over fields are important cases in 
which they do extend. 


When discussing factoring, we assume that the ring R is an integral domain, so that the 
Cancellation Law 11.7.1 is available, and we exclude the element zero from consideration. 
Here is some terminology that we use: 


(2724) uisaunit if u has a multiplicative inverse in R. 
a divides b if b= aq for some qin R. 
ais a proper divisor of b if b = aq and neither a nor q is a unit. 
aand bare associates if each divides the other, or if b = ua, and u is a unit. 


a is irreducible if ais not a unit, and it has no proper divisor — 
its only divisors are units and associates. 


pisaprime element if pis not a unit, and whenever p divides a product ab, 
then p divides a or p divides b. 


These concepts can be interpreted in terms of the principal ideals generated by the elements. 
Recall that the principal ideal (a) generated by an element a consists of all elements of R 
that are are divisible by a. Then 


(12122) uisaunit <— (u)=(1). 
adividesbh @~ (b)C(a). 
ais a proper divisorofb <= (b) < (a) <(). 
aand bare associates <= (a)=(b). 
aisirreducible << (a) < (1), and there is no principal ideal (c) 
such that (a) < (c) < (1). 
pisaprime element <= abe (p)impliesaé (p) orbe (p). 


Before continuing, we note one of the simplest examples of a ring element that has 
more than one factorization. The ring is R = Z[/-5]. It consists of all complex numbers of 
the form a + bV-5S, where a and b are integers. We will use this ring as an example several 
times in this chapter and the next. In R, the integer 6 can be factored in two ways: 


(12.2.3) 2-3=6=(14+ v-5)(1 — V-5). 


It isn’t hard to show that none of the four terms 2, 3, 1 + /—5, 1 — V-5 can be factored 
further; they are irreducible elements of the ring. 


We abstract the procedure of division with remainder first. To make sense of division 
with remainder, we need a measure of size of an element. A size function on an integral 
domain R can be any function o whose domain is the set of nonzero elements of R, and 
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whose range is the set of nonnegative integers. An integral domain R is a Euclidean domain 


if there is a size function o on R such that division with remainder is possible, in the following 
sense: 


Let a and b be elements of R, and suppose that a is not zero. 
(12.2.4) There are elements g and r in R such that b = aq +7, 
and either r=0 orelse o(r) < o(a). 


The most important fact about division with remainder is that r is zero, if and only if a 
divides b. 


Proposition 12.2.5 


(a) The ring Z of integers is a Euclidean domain, with size function o(@) = |a. 

(b) A polynomial ring F[x] in one variable over a field F is a Euclidean domain, with 
Cj dectce of 7. 

(c) The ring Z[i] of Gauss integers is a Euclidean domain, with o(a) = |a/’. 


The ring of integers and the polynomial rings were discussed in Chapter 11. We show 
here that the ring of Gauss integers is a Euclidean domain. The elements of Z[i] form a 
square lattice in the complex plane, and the multiples of a given nonzero element @ form 
the principal ideal (aw), which is a similar geomettic figure. If we write a = re’, then (a) is 
obtained from the lattice Z[i] by rotating through the angle @ and stretching by the factor r, 
as is illustrated below with aw = 2 +1: 


* e ° e * ° 
a . . ° * ° ° . 
a ° e ° * 

* . ° ° * . 


* . e . * ° 
ok ry . e *k . 
. e e * e . ° * 
ok . e e * 
(12.2.6) | A Principal Ideal in the Ring of Gauss Integers. 


For any complex number , there is a point of the lattice (a) whose square distance from B 
is less than |a|*. We choose such a point, say y = aq, and let r = B —y. Then B=aq +r, 
and |r|2 < |@|?, as required. Here q is in Z[i], and if f is in Z|i], so is r. . 

Division with remainder is not unique: There may be as many as four choices for the 
element y. O 


e An integral domain in which every ideal is principal is called a principal ideal domain. 
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Proposition 12.2.7 A Euclidean domain is a principal ideal domain. 


Proof. We mimic the proof that the ring of integers is a principal ideal domain once more. 
Let R be a Euclidean domain with size function o, and let A be an ideal of R. We must 
show that A is principal. The zero ideal is principal, so we may assume that A is not the zero 
ideal. Then A contains a nonzero element. We choose a nonzero element a of A such that 
a(a) is as small as possible, and we show that A is the principal ideal (a) of multiples of a. 
Because A is an ideal and a is in A, any multiple ag with gin Risin A.So (a) CA. To 
show that A C (a), we take an arbitrary element b of A. We use division with remainder to 
write b = aq +r, where either r = 0, or o(r) < o(a). Then b and aq are in A, sor = b— aq 
is in A too. Since o(a) is minimal, we can’t have o(r) < o(a), and it follows that r = 0. This 
shows that a divides b, and hence that b is in the principal ideal (a). Since b is arbitrary, 
A C (a), and therefore A = (a). ' 


Let a and b be elements of an integral domain R, not both zero. A greatest common 
divisor d of a and b is an element with the following properties: 


(a) d divides a and b. 
(b) If an element e divides a and b, then e divides d. 


Any two greatest common divisors d and d’ are associate elements. The first condition tells 
us that both d and d’ divide a and b, and then the second one tells us that d’ divides d and 
also that d divides d’. 

However, a greatest common divisor may not exist. There will often be a common 
divisor m that is maximal, meaning that a/m and b/m have no proper divisor in common. But 
this element may fail to satisfy condition (b). For instance, in the ring Z[/—5] considered 
above (12.2.3), the elements a = 6 and b = 2 + 2¥V-5 are divisible both by 2 and by 
1 + /-5. These are maximal elements among common divisors, but neither one divides 
the other. 

One case in which a greatest common divisor does exist is that a and b have no common 
factors except units. Then 1 is a greatest common divisor. When this is so, a and b are said 
to be relatively prime. 

Greatest common divisors always exist in a principal ideal domain: 


Proposition 12.2.8 Let R be a principal ideal domain, and let a and b be elements of R, 
which are not both zero. An element d that generates the ideal (a,b) = Ra+ Rbisa 
greatest common divisor of a and b. It has these properties: 

(a) Rd = Ra+ Rb, 

(b) d divides a and b. 

(c)} If an element e of R divides both a and b, it also divides d. 

(d) There are elements r and s in R such that d= ra+ sb. 


Proof. This is essentially the same proof as for the ring of integers. (a) restates that d 
generates the ideal (a, b). (b) states that a and b are in Rd, and (a) states that d is in the 
ideal Ra + Rb. For (c), we note that if e divides a and b then a and b are elements of Re. 
In that case, Re contains Ra + Rb = Rd, so e divides d. 
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Corollary 12.2.9 Let R be a principal ideal domain. 


(a) If elements a and b of R are relatively prime, then 1 is a linear combination ra + sb. 
(b) An element of R is irreducible if and only if it is a prime element. 
(c) The maximal ideals of R are the principal ideals generated by the irreducible elements. 


Proof. (a) This follows from Proposition 12.2.8(d). 


(b) In any integral domain, a prime element is irreducible. We prove this below, in Lemma 
12.2.10. Suppose that R is a principal ideal domain and that an irreducible element g of R 
divides a product ab. We have to show that if g does not divide a, then q divides b. Let d be 
a greatest common divisor of a and q. Since q is irreducible, the divisors of g are the units 
and the associates of g. Since g does not divide a, d is not an associate of q. So d is a unit, g 
and a are relatively prime, and 1 = ra+sq withr and sin R. We multiply by b:b = rab+sqpb. 
Both terms on the right side of this equation are divisible by g, so g divides the left side, b. 


(c) Let g be an irreducible element. Its divisors are units and associates. Therefore the only 
principal ideals that contain (q) are (q) itself and the unit ideal (1) (see (12.2.2)). Since 
every ideal of R is principal, these are the only ideals that contain (q). Therefore (q) is a 
maximal ideal. Conversely, if an element b has a proper divisor a, then (b) < (a) < (1), so 
(b) is not a maximal ideal. 


Lemma 12.2.10 In an integral domain R, a prime element is irreducible. 


Proof. Suppose that a prime element p is a product, say p = ab. Then p divides one of the 
factors, say a. But the equation p = ab shows that a divides p too. Soa and p are associates 
and b is a unit. The factorization is not proper. O 


What analogy to the Fundamental Theorem of Arithmetic 12.1.2(d) could one hope for 
in an integral domain? We may divide the desired statement of uniqueness of factorization 
into two parts. First, a given element should be a product of irreducible elements, and 
second, that product should be essentially unique. 

Units in a ring complicate the statement of uniqueness. Unit factors must be disregarded 
and associate factors must be considered equivalent. The units in the ring of integers are 
+1, and in this ring it is natural to work with positive integers. Similarly, in the polynomial 
ring F[x] over a field, it is natural to work with monic polynomials. But we don’t have a 
reasonable way to normalize elements in an arbitrary integral domain; it is best not to try. 

We say that factoring in an integral domain R is unique if, whenever an element a of 
R is written in two ways as a product of irreducible elements, say 


(i221) Pi:::Pm =aG=d1°""- Qn, 


then m = n, and if the right side is rearranged suitably, q; is an associate of p; for each 1. So 
in the statement of uniqueness, associate factorizations are considered equivalent. 
For example, in the ring of Gauss integers, 


2+@2-)=5= (1421) —- 2%). 


These two factorizations of the element 5 are equivalent because the terms that appear on 
the left and right sides are associates: -i(2 +i) = 1 —2i and i(2—i) =1+ 21. 
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It is neater to work with principal ideals than with elements, because associates generate 
the same principal ideal. However, it isn’t too cumbersome to use elements and we will stay 
with them here. The importance of ideals will become clear in the next chapter. 

When we attempt to write an element a as a product of irreducible elements, we always 
assume that it is not zero and not a unit. Then we attempt to factor a, proceeding this way: If 
a is irreducible, we stop. If not, then a has a proper factor, so it decomposes in some way as 
a product, say a = ayb,, where neither a; nor b, is a unit. We continue factoring a; and by, 
if possible, and we hope that this procedure terminates; in other words, we hope that after a 
finite number of steps all the factors are irreducible. We say that factoring terminates in R if 
this is always true, and we refer to a factorization into irreducible elements as an irreducible 
factorization. 

An integral domain R is a unique factorization domain if it has these properties: 


(12242) 
e Factoring terminates. 
e The irreducible factorization of an element a is unique in the sense described above. 


The condition that factoring terminates has a useful description in terms of principal 
ideals: 


Proposition 12.2.13 Let R be an integral domain. The following conditions are equivalent: 


¢ Factoring terminates. 


e R does not contain an infinite strictly increasing chain (a;) < (a2) < (a3) <--- of 
principal ideals. 


Proof. If the process of factoring doesn’t terminate, there will be an element a; with a 
proper factorization such that the process fails to terminate for at least one of the factors. 
Let’s say that the proper factorization is aj = a2b2, and that the process fails to terminate 
for the factor we call a2. Since az is a proper divisor of aj, (a;) < (az) (see (12.2.2)). We 
replace a; by a2 and repeat. In this way we obtain an infinite chain. 


Conversely, if there is a strictly increasing chain (a;) < (az) < ---, then none of the 
ideals (ay) is the unit ideal, and therefore az is a proper divisor of aj, a3 is a proper divisor 
of a2, and so on (12.2.2). This gives us a nonterminating process. O 


We will rarely encounter rings in which factoring fails to terminate, and we will prove 
a theorem that explains the reason later (see (14.6.9)), so we won’t worry much about it 
here. In practice it is the uniqueness that gives trouble. Factoring into irreducible elements 
will usually be possible, but it will not be unique, even when one takes into account the 
ambiguity of associate factors. 


Going back to the ring R = Z[/-5]., it isn’t hard to show that all of the elements 2, 3, 
1+ /-5 and 1 — J-5 are irreducible, and that the units of R are 1 and -1. So 2 is not an 
associate of 1 + /-5 or of 1 — V-5. Therefore 2-3 = 6 = (2/5) (= 85 eee essentially 
different factorizations: R is not a unique factorization domain. 


Section 12.2 Unique Factorization Domains 365 


Proposition 12.2.14 


(a) Let R be an integral domain. Suppose that factoring terminates in R. Then Risa unique 
factorization domain if and only if every irreducible element is a prime element. 


(b) A principal ideal domain is a unique factorization domain. 


(c) The rings Z, Z[i] and the polynomial ring F[x] in one variable over a field F are unique 
factorization domains. 


Thus the phrases irreducible factorization and prime factorization are synonymous in 
unique factorization domains, but most rings contain irreducible elements that are not prime. 
In the ring Z[/-5], the element 2 is irreducible. It is not prime because, though it divides the 
product (1 + /-5)(1 — V-5), it does not divide either factor. 

The converse of (b) is not true. We will see in the next section that the ring Z[x] of 
integer polynomials is a unique factorization domain, though it isn’t a principal ideal domain. 


Proof of Proposition (12.2.14). First of all, (c) follows from (b) because the rings mentioned 
in (ce) are Euclidean domains, and therefore principal ideal domains. 


(a) Let R be a ring in which every irreducible element is prime, and suppose that an element 
a factors in two ways into irreducible elements, say p, --- Pm = @=q1°-::Gn, wherem <n. 
Ifm = 1, then m = 1 and p; = q1. Suppose that n > 1. Since py; is prime, it divides one of 
the factors gi, ..., Gn, Say q1. Since q; is irreducible and since pj is not a unit, g, and pj; are 
associates, Say Pj = uUq;, where u is a unit. We move the unit factor over to q2, replacing 
qi by uq, and gz by u'q. The result is that now py = gq}. Then we cancel p; and use 
induction on n. 

Conversely, suppose that there is an irreducible element p that is not prime. Then 
there are elements a and b such that p divides the product r = ab, say r = pc, but p 
does not divide a or b. By factoring a, b, and c into irreducible elements, we obtain two 
inequivalent factorizations of r. 


(b) Let R be a principal ideal domain. Since every irreducible element of R is prime (12.2.8), 
we need only prove that factoring terminates (12.2.14). We do this by showing that R 
contains no infinite strictly increasing chain of principal ideals. We suppose given an infinite 
weakly increasing chain 

tanita) (a3) Co =e, 


and we prove that it cannot be strictly increasing. 


Lemma 12.2.15 Let 7; C 5 C 13 C... be an increasing chain of ideals in a ring R. The union 
J =U In is an ideal. 


Proof. If u and vare in J, they are both in J, for some n. Then u + v and ru, for any r in 
R, are also in /,,, and therefore they are in J. This shows that J is an ideal. 0 


We apply this lemma to our chain of principal ideals, with 1, = (a,), and we use the 
hypothesis that R is a principal ideal domain to conclude that the union J is a principal 
ideal, say J = (b). Then since b is in the union of the ideals (a, ), it is in one of those ideals. 
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But if b is in (ay), then (b) C (an). On the other hand, (an) C (an41) C (b). Therefore 
(b) = (an) = (An+1). The chain is not strictly increasing. Oj 


One can decide whether an element a divides another element b in a unique factorization 
domain, in terms of their irreducible factorizations. 


Proposition 12.2.16 Let R be a unique factorization domain. 


(a) Let a = pi--: Pm and b = q,--+@n be irreducible factorizations of two elements of 
R. Then a divides b in R if and only if m < n and, when the factors q; are arranged 
suitably, p; is an associate of g; fori =1,...,m. 

(b) Any pair of elements a, b, not both zero, has a greatest common divisor. 


Proof. (a) This is very similar to the proof of Proposition 12.2.14(a). The irreducible factors 
of a are prime elements. If a divides b, then p; divides b, and therefore p; divides some qj, 
say q,. Then p, and q; are associates. The assertion follows by induction when we cancel p; 
from a and q; from b. We omit the proof of (b). a) 


Note: Any two greatest common divisors of a and b are associates. But unless a unique 
factorization domain is a principal ideal domain, the greatest common divisor, though it 
exists, needn’t have the form ra + sb. The greatest common divisor of 2 and x in the unique 
factorization domain Z[x] is 1, but we cannot write 1 as a linear combination of those 
elements with integer polynomials as coefficients. O 


We review the results we have obtained for the important case of a polynomial ring 
F[x] over a field. The units in the polynomial ring F[x] are the nonzero constants. We can 
factor the leading coefficient out of a nonzero polynomial to make it monic, and the only 
monic associate of a monic polynomial f is f itself. By working with monic polynomials, 
the ambiguity of associate factorizations can be avoided. With this taken into account, the 
next theorem follows from Proposition 12.2.14. 


Theorem 12.2.17 Let F[x] be the polynomial ring in one variable over a field F. 


(a) Two polynomials f and g, not both zero, have a unique monic greatest common divisor 
d, and there are polynomials 7 and s such that rf + sg = d. 


(b) If two polynomials f and g have no nonconstant factor in common, then there are 
polynomials r and s such that rf + sg = 1. 

(c) Every irreducible polynomial p in F[x] is a prime element of F[x]: If p divides a 
product fg, then p divides f or p divides g. 

(d) Unique factorization: Every monic polynomial in F[x] can be written as a product 


P1 ++: Pk, Where p; are monic irreducible polynomials in F[x] and k > 0. This factor- 
ization is unique except for the ordering of the terms. O 


In the future, when we speak of the greatest common divisor of two polynomials with 
coefficients in a field, we will mean the unique monic polynomial with the properties (a) 
above. This greatest common divisor will sometimes be denoted by gcd( f, g). 

The greatest common divisor gcd( f, g) of two.polynomials f and g, not both zero, 
with coefficients in a field F’ can be found by repeated division with remainder, the process 
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called the Euclidean algorithm that we mentioned in Section 2.3 for the ring of integers: 
Suppose that the degree of g is at least equal to the degree of f. We write g = fg +r where 
the remainder r, if it is not zero, has degree less than that of f. Then gcd(f, g) = ged(f, r). 
Ifr = 0, gcd(f, g) = f. If not, we replace f and g by r and f, and repeat the process. 
Since degrees are being lowered, the process is finite. The analogous method can be used to 
determine greatest common divisors in any Euclidean domain. 


Over the complex numbers, every polynomial of positive degree has a root a, and 
therefore a divisor of the form x — a. The irreducible polynomials are linear, and the 
irreducible factorization of a monic polynomial has the form 


(12.2.18) F(x) = (%— 0) ---e— aen), 


where a; are the roots of f(x), with repetitions for multiple roots. The uniqueness of this 
factorization is not surprising. 

When F = R, there are two classes of irreducible polynomials: linear and quadratic. A 
real quadratic polynomial x* + bx + c is irreducible if and only if its discriminant b* — 4c 
is negative, in which case it has a pair of complex conjugate roots. The fact that every 
irreducible polynomial over the complex numbers is linear implies that no real polynomial 
of degree >2 is irreducible. 


Proposition 12.2.19 Let a be a complex, not real, root of a real polynomial f. Then the 
complex conjugate @ is also a root of f. The ae polynomial g = (x — w)(x — @) has 
real coefficients, and it divides f. fa 


Factoring polynomials in the ring Q[x] of polynomials with rational coefficients is more 
interesting, because there exist irreducible polynomials in Q[x] of arbitrary degree. This is 
explained in the next two sections. Neither the form of the irreducible factorization nor its 
uniqueness are intuitively clear in this case. 

For future reference, we note the following elementary fact: 


Proposition 12.2.20 A polynomial f of degree n with coefficients in a field F has at most n 
roots in F’. ' 


Proof. An element @ is a root of f if and only if x — aw divides f (11.2.11). If so, we can 
write f(x) = (x — a)q(x), where q(x) is a polynomial of degree n — 1. Let 6 be a root of 
f different from a. Substituting x = B, we obtain 0 = (B — w)q(). Since B is not equal 
to a, it must be a root of g. By induction on the degree, g has at most n — 1 roots in F. 
Putting those roots together with a, we see that f has at most 7 roots. O 


12.3. GAUSS’S LEMMA 


Every monic polynomial f(x) with rational coefficients can be expressed uniquely in the 
form p1--- Px, Where p; are monic polynomials that are irreducible elements in the ring 
Q[x]. pen suppose that a polynomial f(x) has integer coefficients, and that it factors in Q[x]. 
Can it be factored without leaving the ring Z[x] of integer polynomials? We will see that it 
can, and also that Z[x] is a unique factorization domain. 
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Here is an example of an irreducible factorization in integer polynomials: 
6x3 + 9x7 49x +3 = 3(2x + D(X? +x4+1). 


As we see, irreducible factorizations are slightly more complicated in Z[x] than in Q[x]. 
Prime integers are irreducible elements of Z[x], and they may appear in the factorization of a 
polynomial. And, if we want to stay with integer coefficients, we can’t require monic factors. 


We have two main tools for studying factoring in Z[x]. The first is the inclusion of the 
integer polynomial ring into the ring of polynomials with rational coefficients: 


Z(x] C Q[x]. 


This can be useful because algebra in the ring Q[x] is simpler. 
The second tool is reduction modulo some integer prime p, the homomorphism 


(28.1) Wp: Z[x] > Fp[x] 


that sends x ~» x (11.3.6). We’ll often denote the image w,(f) of an integer polynomial by 
f, though this notation is ambiguous because it doesn’t mention p. 
The next lemma should be clear. 


Lemma 12.3.2. Let f(x) =anx" + ---+a;x+ ap be an integer polynomial, and let p be an 
integer prime. The following are ame 

e p divides every coefficient a; of f in Z, 

e p divides f in Z[x], 

e f isin the kernel of Wp. O 


The lemma shows that the kernel of yp can be interpreted easily without mentioning 
the map. But the facts that wp is a homomorphism and that its image Fp[x] is an integral 
domain make the interpretation as a kernel useful. 
¢ Apolynomial f(x) = dnx” +--:+a,x+ apg with rational coefficients is called primitive if it 
is an integer polynomial of positive degree, the greatest commmon divisor of its coefficients 
ao,..-, 4, im the integers is 1, and its leading coefficient a, is positive. 


Lemma 12.3.3. Let f be an integer polynomial /f of positive degree, with positive leading 
coefficient. The following conditions are equivalent: 


° f is primitive, 
e f is not divisible by any integer prime p, 
e for every integer prime p, Wp(f) +0. CO 


Proposition 12.3.4 


(a) An integer is a prime element of Z[x] if and only if it is a prime integer. So a prime 
integer p divides a product fg of integer polynomials if and only if p divides f or p 
divides g. 


(b) (Gauss’s Lemma) The product of primitive polynomials is primitive. 
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Proof. (a) It is obvious that an integer must be a prime if it is an irreducible element of Z[x]. 
Let p be a prime integer. We use bar notation: f = y p(f). Then p divides fg if and only if 
fg = 0, and since F ,[x] is a domain, this is true if and only if f = 0 or 2 = 0, i.e., if and only 
if p divides f or p divides g. 


(b) Suppose that f and g are primitive polynomials. Since their leading coefficients are 
positive, the leading coefficient of fg is also positive. Moreover, no prime p divides f or g, 
and by (a), no prime divides fg. So fg is primitive. ~ O 


Lemma 12.3.5 Every polynomial f(x) of positive degree with rational coefficients can be 
written uniquely as a product f(x) = c fo(x), where c is a rational number and fo(x) is a 
primitive polynomial. Moreover, c is an integer if and only if f is an integer polynomial. If 
f is an integer polynomial, then the greatest common divisor of the coefficients of f is +c. 


Proof. To find fo, we first multiply f by an integer d to clear the denominators in its 
coefficients. This will give us a polynomial df = f; with integer coefficients. Then we factor 
out the greatest common divisor of the coefficients of f; and adjust the sign of the leading 
coefficient. The resulting polynomial fo is primitive, and f = c fo for some rational number 
c. This proves existence. 

If f is an integer polynomial, we don’t need to clear the denominator. Then c will be 
an integer, and up to sign, it is the greatest common divisor of the coefficients, as stated. 

The uniqueness of this product is important, so we check it carefully. Suppose given 
rational numbers c and c’ and primitive polynomials fo and fj such that c fo = c’ fj. We 
will show that fo = fp. Since Q[x] is a domain, it will follow that c = c’. 

We multiply the equation c fo = c’ fp by an integer and adjust the sign if necessary, to 
reduce to the case that c and c’ are positive integers. If c#1, we choose a prime integer -p 
that divides c. Then p divides c’ fj. Proposition 12.3.4(a) shows that p divides one of the 
factors c’ or fj. Since fj is primitive, it isn’t divisible by p, so p divides c’. We cancel p 
from both sides of the equation. Induction reduces us to the case that c = 1, and the same 
reasoning shows that then c’ = 1.So fo = fp. 0 


Theorem 12.3.6 


(a) Let fo be a primitive polynomial, and let g be an integer polynomial. If fo divides g in 
Q[x], then fo divides g in Z[x]. 

(b) If two integer polynomials f and g have a common nonconstant factor in Q[x], they 
have a common nonconstant factor in Z[x]. 


Proof. (a) Say that g = fog where q has rational coefficients. We show that g has integer 
coefficients. We write g = cgo, and g = c’qo, with go and qo primitive. Then cgo = c’ fogo. 
Gauss’s Lemma tells us that fogo is primitive. Therefore by the uniqueness assertion of 
Lemma 12.3.5, c = c’ and go = fogo. Since g is an integer polynomial, c is an integer. So 
q = cqo is an integer polynomial. 


(b) If the integer polynomials f and g have a common factor A in Q[x] and if we write 
h = cho, where ho is primitive, then ho also divides f and g in Q|[x], and by (a), Ao divides 
both f and gin Z[x]. oO 
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Proposition 12.3.7 


(a) Let f be an integer polynomial with positive leading coefficient. Then f is an irreducible 
element of Z[x] if and only if it is either a prime integer or a primitive polynomial that 
is irreducible in Q[x]. 

(b) Every irreducible element of Z[x] is a prime element. 


Proof. Proposition 12.3.4(a) proves (a) and (b) for a constant polynomial. If f is irreducible 
and not constant, it cannot have an integer factor different from +1, so if its leading coefficient 
is positive, it will be primitive. Suppose that f is a primitive polynomial and that it has a 
proper factorization in Q[x], say f = gh. We write g = cgg and h = cho, with go and ho 
primitive. Then goho is primitive. Since f is also primitive, f = goho. Therefore f has a 
proper factorization in Z[x] too. So if f is reducible in Q{[x], it is reducible in Z[x]. The fact 
that a primitive polynomial that is reducible in Z[x] is also reducible in Q[x] is clear. This 
proves (a). 

Let f be a primitive irreducible polynomial that divides a product gh of integer 
polynomials. Then f is irreducible in Q[x]. Since Q[x] is a principal ideal domain, f is a 
prime element of Q[x] (12.2.8). So f divides g or h in Q[x]. By (12.3.6) f divides g or h in 
Z|x]. This shows that f is a prime element, which proves (b). a) 


Theorem 12.3.8 The polynomial ring Z[x] is a unique factorization domain. Every nonzero 
polynomial f(x) € Z[x] that is not +1 can be written as a product 


F(®) = =P1-+: Pmgi() ---qn(X), 


where p; are integer primes and q ;(x) are primitive irreducible polynomials. This expression 
is unique except for the order of the factors. 


Proof. It is easy to see that factoring terminates in Z[x], so this theorem follows from 
Propositions 12.3.7 and 12.2.14. 0 


The results of this section have analogues for the polynomial ring F[t, x] in two 
variables over a field F. To set up the analogy, we regard F[t, x] as the ring F[t][x] of 
polynomials in x whose coefficients are polynomials in t. The analogue of the field Q will be 
the field F(z) of rational functions in f, the field of fractions of F[t]. We’ll denote this field 
by F. Then F[t, x] is a subring of the ring F[x] of polynomials 


f = an ()x" + +++ +a,(D)x + and) 


whose coefficients a;(t) are rational functions in ¢. This can be useful because every ideal of 
F [x] is principal. 

The polynomial f is called primitive if it has positive degree, its coefficients a;(f) are 
polynomials in F[t] whose greatest common divisor is equal to 1, and the leading coefficient 
p(t) is monic. A primitive polynomial will be an element of the polynomial ring F[t, x]. 

It is true again that the product of primitive polynomials is primitive, and that every 
element f(t, x) of F[x] cam be written in the form c(f) fo(t, x), where fo is a primitive 


polynomial in F{t, x] and c is a rational function in ¢, both uniquely determined up to 
constant factor. 
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The proofs of the next assertions are almost identical to the proofs of Proposition 12.3.4 
and Theorems 12.3.6 and 12.3.8. 


Theorem 12.3.9 Let F[t] be a polynomial ring in one variable over a field F, and let 

F = F(t) be its field of fractions. 

(a) The product of primitive polynomials in F[t, x] is primitive. 

(b) Let fo be a primitive polynomial, and let g be a polynomial in F{t, x]. If fo divides g in 
F [x], then fo divides g in F[t, x]. 

(c) If two polynomials f and g in F[t, x] have acommon nonconstant factor in F [x], they 
have a common nonconstant factor in F[t, x]. 


(d) Let f be an element of F[t, x] whose leading coefficient is monic. Then f is an 
irreducible element of F[t, x] if and only if it is either an irreducible polynomial in t 
alone, or a primitive polynomial that is irreducible in F [x]. 


(e) The ring F[t, x] is a unique factorization domain. E] 


The results about factoring in Z[x] also have analogues for polynomials with coefficients 
in any unique factorization domain R. 


Theorem 12.3.10 If R is a unique factorization domain, the polynomial ring R[x), ..., Xn] 
in any number of variables is a unique factorization domain. 


Note: In contrast to the case of one variable, where every complex polynomial is a product of 
linear polynomials, complex polynomials in two variables are often irreducible, and therefore 
prime elements, of C[t, x]. O 


12.4 FACTORING INTEGER POLYNOMIALS 


We pose the problem of factoring an integer polynomial 
(12.4.1) f(K) = anx” +-+-+a,X+ a, 
with a, #0. Linear factors can be found fairly easily. 


Lemma 12.4.2 

(a) If an integer polynomial b,x + bo divides f in Z[x], then b; divides a, and bo 
divides apo. 

(b) A primitive polynomial b,x + bo divides f in Z[x] if and only if the rational number 
-bo/b, is a root of Ff. 

(c) A rational root of a monic integer polynomial / is an integer. 


Proof. (a) The constant coefficient of a product (b,x + bo) Gua +---+qo) 1s bogo, 
and if gn—-1 #0, the leading coefficient is bj qn_-1. 


(b) According to Theorem 12.3.10(c), b1x + bo divides f in Z[x] if and only if it divides fin 
Q[x], and this is true if and only if x + bo/b; divides f, i.e.,~bo/by 1s a root. 
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(c) If w = a/b is a root, written with b > 0, and if gcd(a, b) = 1, then bx — ais a primitive 
polynomial that divides the monic polynomial f, so b = 1 and @ is an integer. 0 


The homomorphism yp : Z[x] > F,[x] (12.3.1) is useful for explicit factoring, one 
reason being that there are only finitely many polynomials in F p[x] of each degree. 


Proposition 12.4.3. Let f(x) = anx” + ---+ ao be an integer polynomial, and let p be a 
prime integer that does not divide the leading coefficient ay. If the residue f of f modulo p 
is an irreducible element of F,[x], then f is an irreducible element of Q[-]. 


Proof. We prove the contrapositive, that if f is reducible, then f is reducible. Suppose that 
f = gh is a proper factorization of f in Q[x]. We may assume that g and A are in Z[x] 
(12.3.6). Since the factorization in Q[x] is proper, both g and h have positive degree, and, if 
deg f denotes the degree of f, then deg f = deg g + degh. sal 

Since yp is a homomorphism, f = gh, so deg f = deg g + degh. For any integer 
polynomial p, deg p < deg p. Our assumption on the leading coefficient of f tells us that 
deg f = deg f. This being so we must have deg g = deg g and degh = degh. Therefore 
the factorization f = gh is proper. O 


If p divides the leading coefficient of f, then f has lower degree, and using reduction 
modulo p becomes harder. 


If we suspect that an integer polynomial is irreducible, we can try reduction modulo p 
for asmall prime, p = 2 or 3 for instance, and hope that f turns out to be irreducible and of 
the same degree as f. If so, f will be irreducible too. Unfortunately, there exist irreducible 
integer polynomials that can be factored modulo every prime p. The polynomial x*—10x7+1 
is an example. So the method of reduction modulo p may not work. But it does work 
quite often. 


The irreducible polynomials in F p[x] can be found by the ‘‘sieve”’ method. The sieve 
of Eratosthenes is the name given to the following method of determining the prime integers 
less than a given number n. We list the integers from 2 to n. The first one, 2, is prime because 
any proper factor of 2 must be smaller than 2, and there is no smaller integer on our list. We 
note that 2 is prime, and we cross out the multiples of 2 from our list. Except for 2 itself, 
they are not prime. The first integer that is left, 3, is a prime because it isn’t divisible by any 
smaller prime. We note that 3 is a prime and then cross out the multiples of 3 from our list. 
Again, the smallest remaining integer, 5, is a prime, and so on. 


2345 K789 16 11 BW 13 4 WS % 17 YK 19 


The same method will determine the irreducible polynomials in F [x]. We list the 
monic polynomials, degree by degree, and cross out products. For example, the linear 
polynomials in F>[x] are x and x + 1. They are irreducible. The polynomials of degree 2 are 
x?, x? 4+x,x? +1, and x? + x + 1. The first three have roots in F2, so they are divisible by x 
or by x + 1. The last one, x* + x +1, is the only irreducible polynomial of degree 2 in F2[x]. 
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(12.4.4) The irreducible polynomials of degree < 4 in F> fx}: 
x, X41; owt: x3 4274-1, 84x41: 
A438 41, xttuxtd, x44 23422 4x41. 


By trying the polynomials on this list, we can factor polynomials of degree at most 9 in 
F2[x]. For example, let’s factor f(x) = x° + x +1 in F)[x]. If it factors, there must be an 
irreducible factor of degree at most 2. Neither 0 nor 1 is a root, so f has no linear factor. 
There is only one irreducible polynomial of degree 2, namely p = x” + x + 1. We carry out 
division with remainder: f(x) = p(x)(x° + x? + x) + (x + 1). So p doesn’t divide f, and 
therefore f is irreducible. 

Consequently, the integer polynomial x° — 64x* + 127x? — 200x + 99 is irreducible in 
Q[x], because its residue in F [x] is the irreducible polynomial x> + x? + 1. 


(12.4.5) The monic irreducible polynomials of degree 2 in F3[x]: 


x ae i a dy a el): 


Reduction modulo p may help describe the factorization of a polynomial also when the 
residue is reducible. Consider the polynomial f(x) = x? + 3x” + 9x + 6. Reducing modulo 
3, we obtain x. This doesn’t look like a promising tool. However, suppose that f(x) were 
reducible in Z[x], say f(x) = (x +a)(x? +bx +c). Then the residue of x +a would divide x? 
in F3[x], which would imply a=0 modulo 3. Similarly, we could conclude c=0 modulo 3. It 
is impossible to satisfy both of these conditions because the constant term ac of the product 
is supposed to be equal to 6. Therefore no such factorization exists, and f(x) is irreducible. 

The principle at work in this example is called the Eisenstein Criterion. 


Proposition 12.4.6 Eisenstein Criterion. Let f(x) = anx” +---+do be an integer polynomial 
and let p be a prime integer. Suppose that the coefficients of f satisfy the following conditions: 


e p does not divide ay; 
e p divides all other coefficients ay_1,..., Qo; 
e p” does not divide apo. 


Then f is an irreducible element of Q[x]. 


For example, the polynomial x* + 25x? + 30x + 20 is irreducible in Q[x]. 


Proof of the Eisenstein Criterion. Assume that f satisfies the conditions, and let f denote 
the residue of f modulo p. The hypotheses imply that f = a,x” and that a, 40. If f is 
reducible in Q[x], it will factor in Z[x] into factors of positive degree, say f = gh, where 
g(x) = byx" +--+» + bo and h(x) = csx* +--+ +9. Then g divides a,x", so g has the form 
b,x". Every coefficient of g except the leading coefficient is divisible by p. The same is true 
of h. The constant coefficient ag of f will be equal to boco, and since p divides bo and co, 
p? must divide ao. This contradicts the third condition. Therefore /f is irreducible. O 
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One application of the Eisenstein Criterion is to prove the irreducibility of the 
cyclotomic polynomial P(x) = xP— 14 yp-24...4 x41, where p is a prime. Its roots are 
the pth roots of unity, the powers of ¢ = Ee, different from 1: 


(12.4.7) (x —1) ®(x) = x? -1. 


Lemma 12.4.8 Let p be a prime integer. The binomial coefficient G ) is an integer divisible 
exactly once by p for every r in the range 1 <r < p. 


Proof. The binomial coefficient (?) is 


DY. POD] ae 
i r(r—1)-- 


When r < p, the terms in the denominator are all less than p, so they cannot cancel the 
single p that is in the numerator. Therefore ie ) is divisible exactly once by p. O 


Theorem 12.4.9 Let p be a prime. The cyclotomic polynomial ®(x) = xP~! + xP-* +.--4+ 
x + 1is irreducible over Q. 


Proof. We substitute x = y + 1 into (12.4.7) and expand the result: 


your daoenr1a yea (P)yrtacca( P )yaran 


We cancel y. The lemma shows that the Eisenstein Criterion applies, and that P(y + 1) is 
irreducible. It follows that ®(*x) is irreducible too. O 


Estimating the Coefficients 


Computer programs factor integer polynomials by factoring modulo powers of a prime, 
usually the prime p = 2. There are fast algorithms, the Berlekamp algorithms, to do this. 
The simplest case is that f is a monic integer polynomial whose residue modulo p is the 
product of relatively prime monic polynomials, say f = gh in F p|x]. Then there will be a 
unique way to factor f modulo any power of p. (We won’t take the time to prove this.) 
Lets ipa # ae that this is so, and that we (or the computer) have factored wares. the powers 
DP, p’, p’,... lf f factors in Z[x], the coefficients of the factors modulo p* will stabilize 
when they are represented by integers between - p*/2 and p*/2, and this will produce the 
integer factorization. If f is irreducible in Z[x], the coefficients of the factors won’t stabilize. 
When they get too big, one can conclude that the polynomial is irreducible. 

The next theorem of Couciita can be used to estimate how big the coefficients of the 
integer factors could be. 


Theorem 12.4.10 Let f(x) = x” + an_yx"~! +--- + a,x + ao be a monic polynomial with 
complex coefficients, and let r be the maximum of the absolute values |a;| of its coefficients. 
The roots of f have absolute value less than r + 1. 
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Proof of Theorem 12.4.10. The trick is to rewrite the expression for f in the form 
bay ee (ae +e ayxt+ ao) 


and to use the triangle inequality: 


(124.11) 0 xl” < | f(%)| + lan—alixl"—! +--+ + Jag ll] + Jao! 
Wass 
SO) + ral b+ bel +1) = Lf] rE 
Let a be a complex number with absolute value |a| > r+1. Then ) : i < 1. We substitute 
(04 = 
x = aq into (12.4.11): 
a|? —1 
jai" <I fla] +r < | fla) + lee" 1. 


Therefore | f(@)| > 1, and @ is not a root of f. C 


We give two examples in which r = 1. 


Examples 12.4.12 (a) Let f(x) = x® + x4 4+ x3 4+ x2 +1. The irreducible factorization 
modulo 2 is 


x 4 x! + 2° + Ane + ee > ee? + a). 
Since the factors are distinct, there is just one way to factor f modulo 2’, and it is 
epee ex x? 4 Gx? — x41) tee? qe 1), modulo 4. 
The factorizations modulo 23 and modulo 2+ are the same. If we had made these computa- 


tions, we would guess that this is an integer factorization, which it is. 


(b) Let f(x) = x® — x4 + x? + x? +1. This polynomial factors in the same way modulo 2. If 
f were reducible in Z[x], it would have a quadratic factor x? + ax + b, and b would be the 
product of two roots of f. Cauchy’s theorem tells us that the roots have absolute value less 
than 2, so |b| < 4. Computing modulo 2%, 


eo ee et et (xy — 5) (x4 = 2 x + 7), modulo 16. 


The constant coefficient of the quadratic factor is -5. This is too big, so f is irreducible. 


Note: It isn’t necessary to use Cauchy’s Theorem here. Since the constant coefficient of f is 
1, the fact that -5 #+1 modulo 16 also proves that f is irreducible. 0 


The computer implementations for factoring are interesting, but they are painful to 
carry out by hand. It is unpleasant to determine a factorization modulo 16 such as the one 
above by hand, though it can be done by linear algebra. We won’t discuss computer methods 
further. If you want to pursue this topic, see [LL&L]. 
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12.5 GAUSS PRIMES 


We have seen that the ring Z[i] of Gauss integers is a Euclidean domain. Every element that 
is not zero and not a unit is a product of prime elements. In this section we describe these 
prime elements, called Gauss primes, and their relation to integer primes. 

In Z[i], 5 = (2+ i)(2 — i), and the factors 2 + i and 2 — i are Gauss primes. On the 
other hand, the integer 3 doesn’t have a proper factor in Z[i]. It is itself a Gauss prime. These 
examples exhibit the two ways that prime integers can factor in the ring of Gauss integers. 

The next lemma follows directly from the definition of a Gauss integer: 


Lemma 12.5.1 
e A Gauss integer that is a real number is an integer. 


¢ An integer d divides a Gauss integer a + bi in the ring Z[i] if and only if d divides both a 
and b in Z. CO 


Theorem 12.5.2 


(a) Let 2 be a Gauss prime, and let 7 be its complex conjugate. Then 7rzr is either an integer 
prime or the square of an integer prime. 

(b) Let p be an integer prime. Then p is either a Gauss prime or the product 77 of a Gauss 
prime and its complex conjugate. 

(c) The integer primes p that are Gauss primes are those congruent to 3 modulo 4: 
DR SRTA, 198... 

(d) Let p be an integer prime. The following are equivalent: 


(i) pis the product of complex conjugate Gauss primes. 

(ii) pis congruent 1 modulo 4, or p= 2: p =2,5,13,17,... 
(iii) p is the sum of two integer squares: p = a? + b?. 
(iv) The residue of -1 is a square modulo p. 


Proof of Theorem 12.5.2 (a) Let z be a Gauss prime, ns mz =a-+ bi. We factor the positive 
integer 777 = a* + b’ in the ring of integers: 77 = p1 --- px. This equation is also true in the 
Gauss integers, though it is not necessarily a prime cil tied in that ring. We continue 
factoring each p; if possible, to arrive at a prime factorization in Z[i]. Because the Gauss 
integers have unique factorization, the prime factors we obtain must be associates of the two 
factors mw and 7. Therefore k is at most two. Either rz is an integer prime, or else it is the 
product of two integer primes. Suppose that 717 = pj p2, and say that z is an associate of 
the integer prime p;, ‘ e., that 7 = +p, or +i p;. Then 7 is also an associate of pj, so is 77, so 
Pi = p2, and 7x = pi. 


(b) If p is an integer prime, it is not a unit in Z[i]. (The units are +1, +7.) So p iS divisible by 
a Gauss prime z. Then 7 divides p, and p _ p. So ee integer 77 divides p* in Z[i] and 
also in Z. Therefore 7r7r is equal to p or p*. If 7 = p’, then 2 and Pp are associates, so p is 
a Gauss prime. 


Part (c) of the theorem follows from (b) and (d), so we need not consider it further, and we 
turn to the proof of (d). It is easy to see that (d)(i) and (d)(iii) are equivalent: If p = 77 
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for some Gauss prime, say 7 = a + bi, then p = a* + b? is a sum of two integer squares. 
Conversely, if p = a? + b’, then p factors in the Gauss integers: p = (a — bi)(a + bi), and 
(a) shows that the two factors are Gauss primes. 0 


Lemma 12.5.3 below shows that (d)(i) and (d)(iv) are equivalent, because (12.5.3)(a) 
is the negation of (d)(i) and (12.5.3)(c) is the negation of (d)(iv). 


Lemma 12.5.3. Let p be an integer prime. The following statements are equivalent: 
(a) pis a Gauss prime; 

(b) the quotient ring R = Z[i]/(p) isa field; 

(c) x? +1 is an irreducible element of F p[x] (12.2.8)(e). 


Proof. The equivalence of the first two statements follows from the fact that Z[i]/(p) is a 
field if and only if the principal ideal (p) of Z[i] is a maximal ideal, and this is true if and 
only if p is a Gauss prime (see (12.2.9)). 


What we are really after is the equivalence of (a) and (c), and at a first glance these 
statements don’t seem to be related at all. It is in order to obtain this equivalence that we 
introduce the auxiliary ring R = Z[i]/(p). This ring can be obtained from the polynomial 
ring Z[x] in two steps: first killing the polynomial x? + 1, which yields a ring isomorphic to 
Z{i], and then killing the prime p in that ring. We may just as well introduce these relations 
in the opposite order. Killing the prime p first gives us the polynomial ring F p[x], and then 
killing x* + 1 yields R again, as is summed up in the diagram below. 


kill 
(12.5.4) Z[x] > Fplx] 
kill kill 
xa el 
[] kill RK 
P 


We now have two ways to decide whether or not R is a field. First, R will be a field if 
and only if the ideal (p) in the ring Z[i] is a maximal ideal, which will be true if and only if p 
is a Gauss prime. Second, R will be a field if and only if the ideal (x? +1) in the ring Fp[x] 
is a maximal ideal, which will be true if and only if x + 1 is an irreducible element of that 
ring (12.2.9). This shows that (a) and (c) of Theorem 12.5.2 are equivalent. O 


To complete the proof of equivalence of (i)—(iv) of Theorem 12.5.2(d), it suffices to 
show that (ii) and (iv) are equivalent. It is true that -1 is a square modulo 2. We look at the 
primes different from 2. The next lemma does the job: 


Lemma 12.5.5 Let p be an odd prime. 


(a) The multiplicative group ine contains an element of order 4 if and only if p = 1 
modulo 4. 

(b) The integer a solves the congruence x? =-1 modulo p if and only if its residue @ is an 
element of order 4 in the multiplicative group FS. 
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Proof. (a) This follows from a fact mentioned before, that the multiplicative group EB isa 
cyclic group (see (15.7.3)). We give an ad hoc proof here. The order of an element divides the 
order of the group. So if @ has order 4 in F's, then the order of F's, which is p — 1, is divisible 
by 4. Conversely, suppose that p — 1 is divisible by 4. We consider the homomorphism 
Q: ae = Las that sends x ~» x”. The only elements of re whose squares are | are +1 (see 


(12.2.20)). So the kernel of g is {+1}. Therefore its image, call it H, has even order‘(p — 1)/2. 
The first Sylow Theorem shows that H contains an element of order 2. That element is the 
square of an element x of order 4. 


(b) The residue a has order 4 if and only if a has order 2. There is just one element in Fp of 
order 2, namely the residue of -1. So @ has order 4 if and only if a” = -1. O 


This competes the proof of Theorem 12.5.2. O 


You want to hit home run without going into spring training? 


—Kenkichi lwasawa 


EXERCISES 


Section 1 Factoring Integers 
1.1. Prove that a positive integer n that is not an integer square is not the square of a rational 
number. 
1.2. (partial fractions) 


(a) Write the fraction 7/24 in the form a/8 + b/3. 


(b) Prove that if m = uv, where u and v are relatively prime, then every fraction 
q = m/n can be written in the form g = a/u + b/v. 


1.3. (Chinese Remainder Theorem) 


(a) Let and m be relatively prime integers, and let a and b be arbitrary integers. Prove 
that there is an integer x that solves the simultaneous congruence x =a modulo m 
and x= b modulo n. 


(b) Determine all solutions of these two congruences. 


1.4. Solve the following simultaneous congruences: 


(a) x=3 modulo 8, x=2 modulo 5, 
(b) x=3 modulo 15, x=5 modulo 8, x=2 modulo 7, 
(c) x=13 modulo 43, x=7 medulo 71. 


1.5. Let a and b be relatively prime integers. Prove that there are integers m and n such that 
a” + b” =1 modulo ab. 
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Section 2 Unique Factorization Domains 


Zot. 


2.2. 


2.3. 
2.4. 


2.5 


e 


2.6. 


2.76 


ae. 


2.9. 


2.10. 


Factor the following polynomials into irreducible factors in F pL]. 

(a) xP +22 4+x+4+1, p=2, (b) x? -3x-3, p=5, (x2 41, pa] 

Chmpule the panaatest common divisor of the polynomials x° + x4 + x3 + x2 +x +1 and 
x + 2x3 + x7 +x41 in Q[x]. 

How many roots does the polynomial x? — 2 have, modulo 8? 

Euclid proved that there are infinitely many prime integers in the following way: If 
Pi,.--, Px are primes, then any prime factor p of (p; --- px) +1 must be different from 


all of the p;. Adapt this argument to prove that for any field F there are infinitely many 
monic irreducible polynomials in F{[x}. 


(partial fractions for polynomials) 


(a) Prove that every element of C(x) x can be written as a sum of a polynomial and a 
linear combination of functions of the form 1/(x — a)’. 


(b) Exhibit a basis for the field C(x) of rational functions as vector space over C. 


Prove that the following rings are Euclidean domains. 
(a) Z[w], @ = e"/3, (b) Z[V-2]. 


Let a and b be integers. Prove that their greatest common divisor in the ring of integers 
is the same as their greatest common divisor in the ring of Gauss integers. 


Describe a systematic way to do division with remainder in Z[i]. Use it to divide 4 + 367 
by5 +i. 

Let F be a field. Prove that the ring F[x, x~!] of Laurent polynomials (Chapter 11, 
Exercise 5.7) is a principal ideal domain. 


Prove that the ring R[[{t]] of formal power series (Chapter 11, Exercise 2.2) is a unique 
factorization domain. 


Section 3 Gauss’s Lemma 


3.1. 


3.2. 


RE i 
3.4. 


BPs 


Let y denote the homomorphism Z[x] —> R defined by 

(a) p(x) =14 V2, (b) p(x) = 5 + v2. 

Is the kernel of g a principal ideal? If so, find a generator. 

Prove that two integer polynomials are relatively prime elements of Q[x] if and only if 

the ideal they generate in Z[x] contains an integer. 

State and prove a version of Gauss’s Lemma for Euclidean domains. 

Let x, y, z, w be variables. Prove that xy — zw, the determinant of a variable 2 <2 matrix, 

is an irreducible element of the polynomial ring C[x, y, z, w]. 

(a) Consider the map y:C[x, y] > C[t] defined i F(x 4 ~~» f(t?, ). Prove that its 
image is the set of polynomials p(t) such that 22 “P (0) = 

(b) Consider the map y:C[x, y] > C[t] defined ie flee: . ~~ f(t? mis , = eRnove 
that kergisa principal ideal, and find a generator g(x, y) for this ideal. Prove that 


the image of g is the set of polynomials p(f) such that p(0) = alge Give an intuitive 
explanation in terms of the geometry of the variety {g = 0} in (os 
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3.6. 


Let a be a complex number. Prove that the kernel of the substitution map Z{x] > C that 
sends x ~» a is a principal ideal, and describe its generator. 


Section 4 Factoring Integer Polynomials 


4.1. 
4.2. 


4.3. 


4.4. 
4.5. 


4.6. 
4.7. 
4.8. 


4.9, 
4.10. 


4.11. 


4.12. 


4.13. 


4.14. 


4.15. 


4.16. 
4.17. 


(a) Factor x9 — x and x? — 1 in F3[x]. (b) Factor x! — x in F2[x]. 

Prove that the following polynomials are irreducible: 

(a) x2 +1, in F7[x], (b) x? — 9, in F3i[x]. 

Decide whether or not the polynomial x* + 6x° + 9x + 3 generates a maximal ideal 
in Q[x]. 

Factor the integer polynomial x° + 2x* + 3x3 + 3x +5 modulo 2, modulo 3, and in Q. 
Which of the following polynomials are irreducible in Q[x]? 

(a) x2+27x +213, (b) 8x3 —6x+1, (0) x°+6x2+1, (d) x? — 3x4 +3. 

Factor x° + 5x + 5 into irreducible factors in Q[x] and in F2[x]. 

Factor x? + x + 1 in F,[x], when p = 2, 3, and 5. 


How might a polynomial f(x) = x++bx? +c with en in a field “, factor in F[x]? 
Explain with reference to the particular polynomials x* + 4x* + dandiux4 3x24. 

For which primes p and which integers n is the polynomial x” — p irreducible in Q[x]? 
Factor the following polynomials in Q[x]. (a) x? + 2351x +125, (b) x° + 2x2 4+3x 41, 
(ix 12x? + 2x? Gaede sae eax See cet 
(f) X9 leds (g) ye a eet tox? ae” Si x as. (i) x oe iJ 
(j) 3x? + 6x44 9x3 + 3x? — 1, (kK) 2 + x4 4x? +4 42. 


Use the sieve method to determine the primes <100, and discuss the efficiency of the 
sieve: How quickly are the nonprimes filtered out? 


Determine: 


(a) the monic irreducible polynomials of degree 3 over F3, 
(b) the monic irreducible polynomials of degree 2 over Fs, 
(c) the number of irreducible polynomials of degree 3 over the field Fs. 


Lagrange interpolation formula: 


(a) Letao,..., ag be distinct complex numbers. Determine a polynomial p(x) of degree 
n, which has a1, ..., Gy, as roots, and such that p(aop) = 1. 


(b) Let ao,...,aqg and bo, ..., bg be complex numbers, and suppose that the a; are 
distinct. There is a unique polynomial g of degree < d such that g(a;) = b; for each 
i=0,...,d. Determine the polynomial g explicitly in terms of a; and b;. 


By analyzing the locus x* + y* = 1, prove that the polynomial x? + y? — 1 is irreducible 
in C[x, y]. 

With reference to the Eisenstein criterion, what can one say when 

(a) f is constant, (b) f =x" + bx"-19 

Factor x'* + 8x!° + 3 in Q[x], using reduction modulo 3 as.a guide. 


Using congruence modulo 4 as an aid, factor x* + 6x? + 7x? + 8x + 9 in Q[x]. 
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*4.18. Let q = p® with p prime, and let r = p*—!. Prove that the cyclotomic polynomial 


(x4 — 1)/(x" — 1) is irreducible. 


4.19. Factor x° — x* — x? — 1 modulo 2, modulo 16, and over Q. 


Section 5 Gauss Primes 


Sel. 
nes 


5.3. 
5.4. 


Js8. 


5.6. 


5.7. 
5.8. 


sgh 


Factor the following into primes in Z[i]: (a) 1—3i, (b)10, (c)64+9i, (d)7+4+i. 
Find the greatest common divisor in Z[7] am (a) i 2 7i,4+7i, (b) 11+ 7i,8+i, 
(c) 3 + 47, 18 — i. 
Find a generator for the ideal of Z[i] generated by 3 + 4i and 4 + 7i. 


Make a neat drawing showing the primes in the ring of Gauss integers in a reasonable 
size range. 


Let 7 be a Gauss prime. Prove that mz and 7 are associates if and only if z is an associate 
of an integer prime, or 77 = 2. 

Let R be the ring Z[/- a Prove that an integer prime p is a prime element of R if and 
only if the polynomial x? + 3 is irreducible in Fp[x]. 

Describe the residue ring Z[i]/(p) for each prime p. 


Let R = Z[w], where w = e?”/3, Make a drawing showing the prime elements of absolute 
value < 10 in R. 

Let R = Z[@], where w = e*”/3. Let p be an integer prime #3. Adapt the proof of 
Theorem 12.5.2 to prove the following: 


(a) The polynomial x* + x + 1 has a root in F p if and only if p=1 modulo 3. 

(b) (p) is a maximal ideal of R if and only if p=-1 modulo 3. 

(c) p factors in R if and only if it can be written in the form p = a? + ab + b’, for some 
integers a and b. 


5.10. (a) Let a be a Gauss integer. Assume that a has no integer factor, and that @a is a 


square integer. Prove that a is a square in Z[/]. 


(b) Let a, b, c be integers such that a and b are relatively prime and ath = c we 
that there are integers m and n such that a = m? —n*,b =2mn,andc = m? +n? 


Miscellaneous Problems 


M.1. 


Let S be a commutative semigroup — a set with a commutative and associative law 
of composition and with an identity element (Chapter 2, Exercise M.4). Suppose the 
Cancellation Law holds in S: If ab = ac then b = c. Make the appropriate definitions 
and extend Proposition 12.2.14(a) to this situation. 


» Leto. vy, berelementseof: Z*, and let S be the semigroup of all combinations 


a1, +---+4nUn with non-negative integer coefficients a;, the law of composition being 
addition (Chapter 2, Exercise M.4). Determine which of these semigroups has unique 
factorization (a) when the coordinates of the vectors v; are nonnegative, and (b) in 
general. 

Hint: Begin by translating the terminology (12.2.1) into additive notation. 


1Suggested by Nathaniel Kuhn. 
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M.3. 


*M.4. 


M.S. 
M.6. 


Miz. 


M.8. 


M.9. 


M.10. 


M.11. 


Let p be an integer prime, and let A be an 7 Xn integer matrix such that A? = J but 
AI. Prove that n > p — 1. Give an example with n = p — 1. 


(a) Let R be the ring of functions that are polynomials in cosf and sint, with real 
coefficients. Prove that R is isomorphic to R[x, y]/ Geer 1). 
(b) Prove that R is not a unique factorization domain. 
(c) Prove that S = C[x, y]/(x? + y? — 1) isa principal ideal domain and hence a unique 
factorization domain. 
(d) Determine the units in the rings S and R. 
Hint: Show that S is isomorphic to a Laurent polynomial ring C[u, u7!]. 


For which integers n does the circle x* + y* = n contain a point with integer coordinates? 


Let R be a domain, and let J be an ideal that is a product of distinct maximal ideals in 
two ways, say / = P; --- Py = Q) --- Qs. Prove that the two factorizations are the same, 
except for the ordering of the terms. 


Wet =7Z|x|. 


(a) Prove that every maximal ideal in R has the form (p, f), where p is an integer prime 
and f is a primitive integer polynomial that is irreducible modulo p. 

(b) Let J be an ideal of R generated by two polynomials f and g that have no common 
factor other than +1. Prove that R/7J is finite. 


Let u and v be relatively prime integers, and let R’ be the ring obtained from Z by 
adjoining an element @ with the relation va = u. Prove that R’ is isomorphic to Z| ¥} 


and also to z[+]. 


Let R denote the ring of Gauss integers, and let W be the R-submodule of V = R? 
generated by the columns of a 2X2 matrix with coefficients in R. Explain how to determine 
the index [V: W]. 


Let f and g be polynomials in C[x, y] with no common factor. Prove that the ring 
R=C[x, y]/(f, g) is a finite-dimensional vector space over C. 


(Berlekamp’s method) The problem here is to factor efficiently in F2[x]. Solving linear 
equations and finding a greatest common divisor are easy compared with factoring. The 
derivative f’ of a polynomial f is computed using the rule from calculus, but working 
modulo 2. Prove: 


(a) (square factors) The derivative f’ isa square, and f’ = 0 if and only if f is a square. 
Moreover, gcd( f, f"’) is the product of powers of the square factors of f. 

(b) (relatively prime factors) Let n be the degree of f. If f = uv, where wu and v are 
relatively prime, the Chincse Remainder Theorem shows that there is a polynomial 
g of degree at most such that 2? — g=0 modulo f. and g can be found by solving 
a system of linear equations. Either ged( f, g) or gcd( f, g — 1) will be a proper 
factor of f. 

(c) Use this method to factor x? + x© + x4 +1. 
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Quadratic Number Fields 


Rien n’est beau que le vrai. 


—Hermann Minkowski 


In this chapter, we see how ideals substitute for elements in some interesting rings. We will 
use various facts about plane lattices, and in order not to break up the discussion, we have 
collected them together in Section 13.10 at the end of the chapter. 


13.1 ALGEBRAIC INTEGERS 


A complex number @ that is the root of a polynomial with rational coefficients is called an 
algebraic number. The kernel of the substitution homomorphism ¢: Q[x] > C that sends x 
to an algebraic number q@ is a principal ideal, as are all ideals of Q[x]. It is generated by the 
monic polynomial of lowest degree in Q[x] that has @ as a root. If a is a root of a product 
gh of polynomials, then it is a root of one of the factors. So the monic polynomial of lowest 
degree with root @ is irreducible. We call this polynomial the irreducible polynomial for a 
over Q. 

e An algebraic number is an algebraic integer if its (monic) irreducible polynomial over Q 
has integer coefficients. 


The cube root of unity @ = e?7/3 = 5(-1 + /-3) is an algebraic integer because its 
irreducible polynomial over Q is x* +x +1, while a = 5(-1 + J3 ) is a root of the irreducible 
polynomial x? — x — 5 and is not an algebraic integer. 


Lemma 13.1.1 A rational number is an algebraic integer if and only if it is an ordinary integer. 


This is true because the irreducible polynomial over Q for a rational number ais x—a. OU 
A quadratic number field is a field of the form Q[/d], where d is a fixed integer, 
positive or negative, which is not a square in Q. Its elements are the complex numbers 


(1371.2) a+bVd, witha and bin Q, 


The notation /d stands for the positive real square root if d > 0 and for the positive 
imaginary square root if d < 0. The field Q[/d] is a real quadratic number field if d > 0, and 
an imaginary quadratic number field if d < 0. 


384 Chapter 13 Quadratic Number Fields 


If d has a square integer factor, we can pull it out of the radical without changing the 
field. So we assume d square-free. Then d can be any one of the integers 


d =21,, +208, +5260 10, oe 


We determine the algebraic integers in a quadratic number field Q[Vd] now. Let 6 
denote Vd, let a = a + bd be an element of Q[6] that is not in Q, that is, with b+0, and let 
a’ = a— bd. Then a@ and a’ are roots of the polynomial 


(13.1.3) (x — a')(x — a) = x* — 2ax + (a* — b*d), 


which has rational coefficients. Since a is not a rational number, it is not the root of a linear 
polynomial. So this quadratic polynomial is irreducible over Q. It is therefore the irreducible 
polynomial for a over Q. 


Corollary 13.1.4 A complex number a = a + bé with a and b in Q is an algebraic integer if 
and only if 2a and a? — bd are ordinary integers. O 


This corollary is also true when b = 0 anda = a. 
The possibilities for a and b depend on congruence modulo 4. Since d is assumed to be 
square free, we can’t have d=0, so d=1, 2, or 3 modulo 4. 


Lemma 13.1.5 Let d be a square-free integer, and let r be a rational number. If r7d is an 
integer, then 7 is an integer. 


Proof. The square-free integer d cannot cancel a square in the denominator of r’. O 


A half integer is a rational number of the form m + 5s where m is an integer. 


Proposition 13.1.6 The algebraic integers in the quadratic field Q[6], with 6? = d and d 
square free, have the form a = a + bé, where: 


¢ If d=2 or 3 modulo 4, then a and D are integers. 
¢ Ifd=1 modulo 4, then a and Db are either both integers, or both half integers. 


The algebraic integers form a ring R, the ring of integers in F. 


Proof. We assume that 2a and a* — b*d are integers, and we analyze the possiblities for a 
and b. There are two cases: Either a is an integer, or a is a half integer. 


Case 1: a is an integer. Then b*d must be an integer. The lemma shows that b is an integer. 


Case 2:a=m+ 5 is a half integer. Then a* = m? +m + fi will be in the set Z + ie Since 
a’? — bd is an integer, bd is also in Z + 1 Then 4b7d is an integer and the lemma shows 
that 2b is an integer. So Dis a half integer, and then bd is in the set Z + i if and only if d=1 
modulo 4. 


The fact that the algebraic integers form a ring is proved by computation. O 
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The imaginary quadratic case d < 0 is easier to handle than the real case, so we 
concentrate on it in the next sections. When d < 0, the algebraic integers form a lattice in the 


complex plane. The lattice is rectangular if d=2 or 3 modulo 4, and “isosceles triangular” if 
d=1 modulo 4. 


When d = -1, Ris the ring of Gauss integers, and the lattice is square. When d = -3, 
the lattice is equilateral triangular. Two other examples are shown below. 


. 2 . " a s bd ” ® x 


d= Giaey 
3.1.7) Integers in Some Imaginary Quadratic Fields. 


Being a lattice is a very special property of the rings that we consider here, and the geometry 
of the lattices helps to analyze them. 

When d=2 or 3 modulo 4, the integers in Q[6] are the complex numbers a + b6é, with 
a and b integers. They form a ring that we denote by Z[6]. A convenient way to write all the 
integers when d=1 modulo 4 is to introduce the algebraic integer 


(13.1.8) n= 5(1 +). 
It is a root of the monic integer polynomial 
(13.1.9) eae 


where h = (1 — d)/4. The algebraic integers in Q[4] are the complex numbers a + bn, with 
a and b integers. The ring of integers is Z [7]. 


13.2. FACTORING ALGEBRAIC INTEGERS 


The symbol R will denote the ring of integers in an imaginary quadratic number field Q[6]. 
To focus your attention, it may be best to think at first of the case that d is congruent 2 or 3 
modulo 4, so that the algebraic integers have the form a + bd, with a and b integers. 


386 Chapter 13 Quadratic Number Fields 


When possible, we denote ordinary integers by Latin letters a, b,..., elements of R 
by Greek letters a, B, ..., and ideals by capital letters A, B,... We work exclusively with 
nonzero ideals. 

Ifa =a+bé isin R, its complex conjugate @ = a — bé is in R too. These are the roots 
of the polynomial x? — 2ax + (a” — b*d) that was introduced in Section 13.1. 


e The norm of a=a+ bd is N(a) = aa. 


The norm is equal to |a|? and also to a” — bd. It is a positive integer for all a #0, and it has 
the multiplicative property: 


(13.2.1) — N(By) = N(B)N(). 


This property gives us some control of the factors of an element. If a = By, then both terms 
on the right side of (13.2.1) are positive integers. To check for factors of a, it is enough to 
look at elements B whose norms divide the norm of @. This is manageable when N(q) is 
small. For one thing, it allows us to determine the units of R. 


Proposition 13.2.2 Let R be the ring of integers in an imaginary quadratic number field. 


« « Anelement a of R is a unit if and only if N(q@) = 1. If so, then a '=@. 


e The units of R are {+1} unless d = -1 or -3. 
e When d = -1, Ris the ring of Gauss integers, and the units are the four powers of 7. 
* When d = -3, the units are the six powers of e*7*/ — (1 + /-3). 


Proof. If a is a unit, then N(a@) N(a@!) = N(1) = 1. Since M(a) and N(a“!) are positive 
integers, they are both equal to 1. Conversely, if N(@) = aq = 1, then @ is the inverse of a, 
so @ is a unit. The remaining assertions follow by inspection of the lattice R. O 


Corollary 13.2.3 Factoring terminates in the ring of integers in an imaginary quadratic 
number field. 


This follows from the fact that factoring terminates in the integers. If a = By is a proper 
factorization in R, then N(a) = N(B)N(y) is a proper factorization in Z. 


Proposition 13.2.4 Let R be the ring of integers in an imaginary quadratic number field. 
Assume that d= 3 modulo 4. Then R is not a unique factorization domain except in the case 
d = -1, when R is the ring of Gauss integers. 


Proof. This is analogous to what happens when d = -5. Suppose that d=3 modulo 4 and 
that d < -1. The integers in R have the form a + bé which a, b € Z, and the units are +1. Let 
e = (1 —d)/2. Then 

2e =1-—d=(1+4+56)(1—93). 


The element 1 — d factors in two ways in R. Since d < -1, there is no element a + bd whose 
norm is equal to 2. Therefore 2, which has norm 4, is an irreducible element of R. If R were 
a unique factorization domain, 2 would divide either 1 + 6 or 1 — 5 in R, which it does not: 
s(1 +4) is not an element of R when d=3 modulo 4. O 
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There is a similar statement for the case d=2 modulo 4. (This is Exercise 22.) But 
note that the reasoning breaks down when d= 1 modulo 4. In that case, 5(1 +65) isin R, and 


in fact there are more cases of unique factorization when d= 1 modulo 4. A famous theorem 
enumerates these cases: 


Theorem 13.2.5 The ring of integers R in the imaginary quadratic field Q[/d] is a unique 
factorization domain if and only if d is one of the integers -1, -2, -3, -7,-11,-19, -43, -67, -163. 


Gauss proved that for these values of d, R has unique factorization. We will learn how to do 
this. He also conjectured that there were no others. This much more difficult part of the theo- 
rem was finally proved by Baker, Heegner, and Stark in the middle of the 20th century, after 
people had worked on it for more than 150 years. We won’t be able to prove their theorem. 


13.3 IDEALS IN Z[/—5] 


Before going to the general theory, we describe the ideals in the ring R = Z[/-5] as lattices 
in the complex plane, using an ad hoc method. 


Proposition 13.3.1 Let R be the ring of integers in an imaginary quadratic number field. 
Every nonzero ideal of R is a sublattice of the lattice R. Moreever, 

e If d=2 or 3 modulo 4, a sublattice A is an ideal if and only if 5A C A. 

e If d=1 modulo 4, a sublattice A is an ideal if and only if 7A C A (see (13.1.8)). 


Proof. A nonzero ideal A contains a nonzero element a, and (a, ad) is an independent set 
over R. Also, A is discrete because it is a subgroup of the lattice R. Therefore A is a lattic 
(Theorem 6.5.5). 


To be an ideal, a subset of R must be closed under addition and under multiplication 
by elements of R. Every sublattice A is closed under addition and multiplication by integers. 
If A is also closed under multiplication by 6, then it is closed under multiplication by an 
element of the form a + bé, with a and b integers. This includes all elements of R if d=2 or 
3 modulo 4. So A is an ideal. The proof in the case d= 1 modulo 4 is similar. O 


We describe ideals in the ring R = Z[6], when 5? = -5. 


Lemma 13.3.2 Let R = Q[6] with 5? = -5. The lattice A of integer combinations of 2 and 
1 + 6 is an ideal. 


Proof. The lattice A is closed under multiplication by 5, because 5 - 2 and 4 - (1 + 4) are 
integer combinations of 2 and 1 + 6. 0 


Figure 13.3.4 shows this ideal. 


Theorem 13.3.3 Let R = Z[65], where 5 = V-5, and let A be a nonzero ideal of R. Let a be 
a nonzero element of A of minimal norm (or minimal absolute value). Then either 


e The set (a, 5) is a lattice basis for A, and A is the principal ideal (a), or 
e The set (a, (a + @6)) is a lattice basis for A, and A is not a principal ideal. 
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This theorem has the following geometric interpretation: The lattice basis (a, a) 
of the principal ideal (@) is obtained from the lattice basis (1, 5) of the unit ideal R by 
multiplying by a. If we write @ in polar coordinates a@ = re’?, then multiplication by a 
rotates the complex plane through the angle @ and stretches by the factor r. So all principal 
ideals are similar geometric figures. Also, the lattice with basis (a, 5 (a + ad5)) is obtained 
from the lattice (2, 1 + 6) by multiplying by Sa. All ideals of the second type are geometric 
figures similar to the one shown below (see also Figure 13.7.4). 


° * e * e * e * e * . 
(13.3.4) . The Ideal (2, 1 + 4) in the Ring Z[/-5]. 


Similarity classes of ideals are called ideal classes, and the number of ideal classes is the 
class number of R. The theorem asserts that the class number of Z[/-5] is two. Ideal classes 
for other quadratic imaginary fields are discussed in Section 13.7. 


Theorem 13.3.3 is based on the following simple lemma about lattices: 


Lemma 13.3.5 Let A be a lattice in the complex plane, let r be the minimum absolute value 
among nonzero elements of A, and let y be an element of A. Let n be a positive integer. 
The interior of the disk of radius ir about the point ty contains no element of A other than 


the center + y. The center may lie in A or not. 


Proof. If B is an element of A in the interior of the disk, then |B — iy! = 4y, which is to 
say, |nB — y| < r. Moreover, nf — y is in A. Since this is an element of absolute value less 
than the minimum, n 8 — y = 0. Then B = iy is the center of the disk. O 


Proof of Theorem 13.3.3. Leta be anonzero element of an ideal A of minimal absolute value 
r. Since A contains @, it contains the principal ideal (@), andif A = (a) we are in the first case. 


Suppose that A contains an element £ not in the principal ideal (@). The ideal (a) has 
the lattice basis B = (a, wd), So we may choose £ to lie in the parallelogram T1(B) of linear 
combinations ra + sa@é with 0 <r, s < 1. (In fact, we can choose f so that 0 < r, s < 1. See 
Lemma 13.10.2.) Because 6 is purely imaginary, the parallelogram is a rectangle. How large 
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the rectangle is, and how it is situated in the plane, depend on a, but the ratio of the side 


lengths is always 1: /5. We’ll be done if we show that Bis the midpoint 4(@ + a6) of the 
rectangle. 5 


Figure 13.3.6 shows disks of radius r about the four vertices of such a rectangle, and 
also disks of radius sr about three half lattice points, 50d, 5 (a + a6), anda + 4a. Notice 
that the interiors of these seven disks cover the rectangle. (It would be fussy to cheek this by 
algebra. Let’s not bother. A glance at the figure makes it clear enough.) 


According to Lemma 13.3.5, the only points of the interiors of the disks that can be 
elements of A are their centers. Since fis not in the principal ideal (a), it is not a vertex of the 
rectangle. So 6B must be one of the three half lattice points. If B = a + 50d, then since @ is in 


A, 5005 will be in A too. So we have only two cases to consider: B = Fad and B = 5 (a +ad). 


This exhausts the information we can get from the fact that A is a lattice. We now use the 
fact that A is an ideal. Suppose that 506 isin A. Multiplying by 6 shows that Sad? =- 5a isin 
A. Then since @ isin A, 5a isin A too. This contradicts our choice of w@ as a nonzero element 
of minimal absolute value. So 6 cannot be equal to Sad. The remaining possibility is that B 
is the center 5 (a -+ a5) of the rectangle. If so, we are in the second case of the theorem. UO 


13.4 IDEAL MULTIPLICATION 


Let R be the ring of integers in an imaginary quadratic number field. As usual, the notation 
A = (a, B, ..., y) means that A is the the ideal of R generated by the elements a, Se a 
It consists of all linear combinations of those elements, with coefficients in the ring. 
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Since a nonzero ideal A is a lattice, it has a lattice basis (a, B) consisting of two 
elements. Every element of A is an infeger combination of a and 6. We must be careful to 
distinguish between the concepts of a lattice basis and a generating set for an ideal. Any 
lattice basis generates the ideal, but the converse is false. For instance, a principal ideal is 
generated as an ideal by a single element, whereas a lattice basis has two elements. 

Dedekind extended the notion of divisibility to ideals using the following definition of 
ideal multiplication: 


¢ Let Aand Bbe ideals inaring R. The product ideal A B consists of all finite sums of products 
(13.4.1) >“ aiBi, with oj in A and f; in B. 

i 
This is the smallest ideal of R that contains all of the products a. 


The definition of ideal multiplication may not be quite as simple as one might hope, 
but it works well. Notice that it is a commutative and associative law, and that it has a unit 
element, namely R. (This is one of the reasons that R is called the unit ideal.) 


(13.4.2) AB=BA, A(BOC)=(ABC, WK= KAA. 
We omit the proof of the next proposition, which is true for arbitrary rings. 


Proposition 13.4.3 Let A and B be ideals of a ring R. 


(a) Let {a1,..., @m} and {f), ..., Bn} be generators for the ideals A and B, respectively. 
The product ideal A B is generated as ideal by the mn products a; 8 ;: Every element of 
AB is a linear combination of these products with coefficients in the ring. 

(b) The product of principal ideals is principal: If A = (a) and B = (B), then AB is the 
principal ideal (af) generated by the product af. 

(c) Assume that A = (q@) is a principal ideal and let B be arbitrary. Then AB is the set of 
products aB with Bin B: AB=aB. O 


We go back to the example of the ring R = Z[6] with 5? = -5, in which 
(13.4.4) 2-3=6=(14+4)(1 — 9). 


If factoring in R were unique, there would be an element y in R dividing both 2 and 1 + 4, 
and then 2 and 1 + 6 would be in the principal ideal (y). There is no such element. However, 
there is an ideal that contains 2 and 1 + 6, namely the ideal (2, 1 +5) generated by these two 
elements, the one depicted in Figure 13.3.4. 

We can make four ideals using the factors of 6: 


(13.4.5) A=(2,1+8), A=(2,1=6), B=iG, 1-3) =e 


In each of these ideals, the generators that are given happen to form lattice bases. We denote 
the last of them by B because it is the complex conjugate of B: 


(13.4.6) B={B| Be B}. 
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It is obtained by reflecting B about the real axis. The fact that R = R implies that the 
complex conjugate of an ideal is an ideal. The ideal A, the complex conjugate of A, is equal 
to A. This accidental symmetry of the lattice A doesn’t occur very often. 

We now compute some product ideals. Proposition 13.4.3(a) tells us that the ideal AA 
is generated by the four products of the generators (2, 1 — 5) and (2,1 +5) of A and A: 


AA = (4,24 26, 2 — 26, 6). 


Each of the four generators is divisible by 2, so AA is contained in the principal ideal (2). 
(The notation (2) stands for the ideal 2R here.) On the other hand, 2 is an element of AA 
because 2 = 6 — 4. Therefore (2) C AA. This shows that AA = (2). 

Next, the product AB is generated by four products: 


AB = (6,2 + 26, 3 + 365, (1+ 5)?). 


Each of these four elements is divisible by 1 + 5, and 1 + 6 is the difference of two of them, 
so it is an element of AB. Therefore AB is equal to the principal ideal (1 + 5). One sees 
similarly that A B = (1 — 5) and that BB = (3). 

The principal ideal (6) is the product of four ideals: 


(13.4.7) (6) = (2)(3) = (AA) (BB) = (A B)(AB) = (1- 6)(1 + 8) 


Isn’t this beautiful? The ideal factorization (6) = AA BB has provided a common refinement 
of the two factorizations (13.4.4). 


In the next section, we prove unique factorization of ideals in the ring of integers of 
any imaginary quadratic number field. The next lemma is the tool that we will need. 


Lemma 13.4.8 Main Lemma. Let R be the ring of integers in an imaginary quadratic number 
field. The product of a nonzero ideal A of R and its conjugate A is a principal ideal, generated 
by a positive ordinary integer n: AA = (n) =nR. 


This lemma would be false for any ring smaller than R, for example, if one didn’t include 
the elements with half integer coefficients, when d=1 modulo 4. 


Proof. Let (a, B) be a lattice basis for the ideal A. Then (a, B) is a lattice basis for x 
Moreover, A and A are generated as ideals by these bases, so the four products aa, a8, 
fa, and BB generate the product ideal AA. The three elements aa, BB, and Ba + @B are 
in AA. They are algebraic integers equal to their complex conjugates, so they are rational 
numbers, and therefore ordinary integers (13.1.1). Let n be their greatest common divisor in 
the ring of integers. It is an integer combination of those elements, so it is also an element of 
AA. Therefore (n) C AA. If we show that n divides each of the four generators of AA in 
R, it will follow that (n) = AA, and this will prove the lemma. 

By construction, n divides @a and BB in Z, hence in R. We have to show that n divides 
@B and Ba. How can we do this? There is a beautiful insight here. We use the definition of 
an algebraic integer. If we show that the quotients y = @B/n and y = Ba/n are algebraic 
integers, it will follow that they are elements of the ring of integers, which is R. This will 
mean that n divides @B and Ba in R. 
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The elements y and ¥ are roots of the polynomial p(x) = x* — (Y¥+y)x+ (Vy): 


Ba + ap pawp aa BB 
FA 


Masi it ieee nn- 


yty= 
By its definition, m divides each of the three integers Ba + @B, ao, and BB. The coefficients 
of p(x) are integers, so y and ¥ are algebraic integers, as we hoped. (See Lemma 12.4.2 for 
the case that y happens to be a rational number.) O 


Our first applications of the Main Lemma are to divisibility of ideals. In analogy with 
divisibility of elements of a ring, we say that an ideal A divides another ideal B if there is an 
ideal C such that B is the product ideal AC. 


Corollary 13.4.9 Let R be the ring of integers in an imaginary quadratic number field. 
(a) Cancellation Law: Let A, B, C be nonzero ideals of R. Then AB = AC if and only if 
B = C. Similarly, AB C AC, if and only if BC C, and AB < AC if and only if B < C. 


(b) Let A and B be nonzero ideals of R. Then A > Bif and only if A divides B, 1.e., if and 
only if there is an ideal C such that B = AC. 


Proof. (a) It is clear that if B = C, then AB = AC. If AB = AC, then AAB = AAC. By 
the Main Lemma, AA = (n), sonB = nC. Dividing by n shows that B = C. The other 
assertions are proved in the same way. 


(b) We first consider the case that a principal ideal (”) generated by an ordinary integer n 
contains an ideal B. Then n divides every element of B in R. Let C = n™'B be the set of 
quotients, the set of elements n} B with 6 in B. You can check that C is an ideal and that 
nC = B. Then Bis the product ideal (n)C, so (n) divides B. 

Now suppose that an ideal A contains B. We apply the Main Lemma again: AA = (n). 
Then (n) = AA contains AB. By what has been shown, there is an ideal C such that 
AB = (n)C = AAC. By the Cancellation Law, B = AC. 


Conversely, if A divides B, say B= AC, then B= ACCAR=A. O 


13.5 FACTORING IDEALS 


We show in this section that nonzero ideals in rings of integers in imaginary quadratic fields 
factor uniquely. This follows rather easily from the Main Lemma 13.4.8 and its Corollary 
13.4.9, but before deriving it, we define the concept of a prime ideal. We do this to be consistent 
with standard terminology: the prime ideals that appear are simply the maximal ideals. 


Proposition 13.5.1 Let R be a ring. The following conditions on an ideal P of R are 
equivalent. An ideal that satisfies these conditions is called a prime ideal. 

(a) The quotient ring R/P is an integral domain. 

(b) PAR, and if a and b are elements of R such that ab ec P, thenae Porbe P. 

(c) P#R, and if A and B are ideals of R such that ABC P, then AC Por BCP. 


Condition (b) explains the term “prime.” It mimics the important property of a prime 
integer, that if a prime p divides a product ab of integers, then p divides a or p divides b. 
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Proof. (a) — (b): ’ The conditions for R/P to be an integral domain are that R/P+{0} 
and ab = 0 implies a = 0 or b = 0. These conditions translate to P# R and ab « P implies 
aePorbeP. 


(b) = (c): Suppose that ab € P implies a € P or b € P, and let A and B be ideals such that 
ABCP. If A ¢ P, there is an element a in A that isn’t in P. Let b be any element of B. 
Then ab is in AB and therefore in P. But a is not in P, so b is in P. Since b was an arbitrary 
element of B, BC P. 


(c) =4 (b): Suppose that P has the property (ce), and let a and b be elements of R such that 
ab is in P. The principal ideal (ab) is the product ideal (a)(b). If ab € P, then (ab) C P, 
and so (a) C P or (b) C P. This tells us thata € Porbe P. O 


Corollary 13.5.2 Let R be a ring. 


(a) The zero ideal of R is a prime ideal if and only if R is an integral domain. 
(b) A maximal ideal of R is a prime ideal. 
(c) A principal ideal (q@) is a prime ideal of R if and only if a is a prime element of R. 


Proof. (a) This follows from (13.5.1)(a), because the quotient ring 7 /(0) is isomorphic to R. 


(b) This also follows from (13.5.1)(a), because when M is a maximal ideal, R/M is a field. 
A field is an integral domain, so M is a prime ideal. Finally, (c) restates (13.5.1)(b) for a 
principal ideal. 0 


This completes our discussion of prime ideals in arbitrary rings, and we go back to the 
ring of integers in an imaginary quadratic number field. 


Corollary 13.5.3 Let R be the ring of integers in an imaginary quadratic number field, let A 
and B be ideals of R, and let P be a prime ideal of R that is not the zero ideal. If P divides 
the product ideal A B, then P divides one of the factors A or B. 


This follows from (13.5.1)(c) when we use (13.4.9)(b) to translate inclusion into divisibility.0 


Lemma 13.5.4 Let R be the ring of integers in an imaginary quadratic number field, and let 
B be a nonzero ideal of R. Then 

(a) Bhas finite index in R, 

(b) there are finitely many ideals of R that contain B, 

(c) Bis contained in a maximal ideal, and 

(d) Bisa prime ideal if and only if it is a maximal ideal. 


Proof. (a) is Lemma 13.10.3(d), and (b) follows from Corollary 13.10.5. 
(c) Among the finitely many ideals that contain B, there must be at least one that is maximal. 


(d) Let P be a nonzero prime ideal. Then by (a), P has finite index in R. So RP isa 
finite integral domain. A finite integral domain is a field. (This is Chapter Ml, Ewexcises7.1.) 
Therefore P is a maximal ideal. The converse is (13.5.2)(b). O 
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Theorem 13.5.5 Let R be the ring of integers in an imaginary quadratic field F. Every 
proper ideal of R is a product of prime ideals. The factorization of an ideal into prime ideals 
is unique except for the ordering of the factors. 


Proof. If an ideal B is a maximal ideal, it is itself a prime ideal. Otherwise, there is an ideal 
A that properly contains B. Then A divides B, say B = AC. The cancellation law shows 
that C properly contains B too. We continue by factoring A and C. Since only finitely many 
ideals contain B, the process terminates, and when it does, all factors will be maximal and 
therefore prime. 


If P,---P, = Q1---Qs, with P; and Q; prime, then P; divides Q,---Qs, and 
therefore P; divides one of the factors, say Q;. Then P; contains Qj, and since Q, is 
maximal, P; = Q). The uniqueness of factorization follows by induction when one cancels 
P, from both sides of the equation. G 


Note: This theorem extends to rings of algebraic integers in other number fields, but it is a 
very special property. Most rings do not admit unique factorization of ideals. The reason is 
that in most rings, P > B does not imply that P divides B, and then the analogy betweei. 
prime ideals and prime elements is weaker. 


Theorem 13.5.6 The ring of integers R in an imaginary quadratic number field is a unique 
factorization domain if and only if it is a principal ideal domain, and this is true if and only if 
the class group C of R (see (13.7.3)) is the trivial group. 


Proof. A principal ideal domain is a unique factorization domain (12.2.14). Conversely, 
suppose that R is a unique factorization domain. We must show that every ideal is principal. 
Since the product of principal ideals is principal and since every nonzero ideal is a product 
of prime ideals, it suffices to show that every nonzero prime ideal is principal. 


Let P be a nonzero prime ideal of R, and let aw be a nonzero element of P. Then q@ is 
a product of irreducible elements, and because R has unique factorization, they are prime 
elements (12.2.14). Since P is a prime ideal, P contains one of the prime factors of a, say 7. 
Then P contains the principal ideal (zr). But since z is a prime element, the principal ideal 
(zr) is a nonzero prime ideal, and therefore a maximal ideal. Since P contains (zt), P = (z0). 
So P is a principal ideal. : O 


13.6 PRIME IDEALS AND PRIME INTEGERS 


In Section 12.5, we saw how Gauss primes are related to integer primes. A similar analysis 
can be made for the ring R of integers in a quadratic number field, but we should speak of 
prime ideals rather than of prime elements. This complicates the analogues of some parts of 
Theorem 12.5.2. We consider only those parts that extend directly. 


Theorem 13.6.1 Let R be the ring of integers in an imaginary quadratic number field. 
(a) Let P be a nonzero prime ideal of R. Say that PP = (n) where n is a positive integer. 
Then n is either an integer prime or the square of an integer prime. 


(b) Let p be an integer prime. The principal ideal (p) = pR is either a prime ideal, or the 
product PP of a prime ideal and its conjugate. 
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(c) Assume that d=2 or 3 modulo 4. An integer prime p generates a prime ideal (p) of R 
if and only if d is not a square modulo p, and this is true if and only if the polynomial 
x? — d is irreducible in F , [x]. 

(d) Assume that d=1 modulo 4, and let h = ne —d). An integer prime p generates a 
prime ideal (p) of R if and only if the polynomial x2 — x + h is irreducible in F pix]. 


Corollary 13.6.2 With the notation as in the theorem, any proper ideal strictly larger than 
(p) is a prime, and therefore a maximal, ideal. O 


e An integer prime p is said to remain prime if the principal ideal (p) = pR is a prime ideal. 
Otherwise, the principal ideal (p) is a product PP ? of a prime ideal and its conjugate, and in 
this case the prime p is said to split. If in addition P = P, the prime p is said to ramify. 


___ Going back to the case d = -5, the prime 2 ramifies in Z[V-5] because (2) = AA and 
A = A. The prime 3 splits. It does not ramify, because (3) = BB and B+ B (see (13.4.5)). 


Proof of Theorem 13.6.1. The proof follows that of Theorem 12.5.2 closely, so we omit the 
proofs of (a) and (b). We discuss (c) in order to review the reasoning. Suppose d=2 or 3 
modulo 4. Then R = Z[6] is isomorphic to the quotient ring Z[x]/(x? — d). A prime integer 
p remains prime in R if and only if R = R/(p) isa field. (We are using a tilde here to avoid 
confusion with complex conjugation.) This leads to the diagram 


kernel 
(13.6.3) Z[x] > F pLx] 
kernel kernel 
(x? - d) (x? — d) 
zp) kernel R 


(p) 


This diagram shows that R is a field if and only if x” — d is irreducible in F p[x]. 


The proof of (d) is similar. 0 


Proposition 13.6.4 Let A, B, C be nonzero ideals with B > C. The index [B:C] of Cin B 
is equal to the index [A B: AC]. 


Proof. Since A is a product of prime ideals, it suffices to show that [B:C] = [PB: PC] when 
P is a nonzero prime ideal. The lemma for an arbitrary ideal A follows when we multiply by 
one prime ideal at a time. 

There is a prime integer p such that either P = (p) or PP = (p) (13.6.1). If P is the 
‘principal ideal (p), the formula to be shown is [B:C] = [pB: pC], and this is rather obvious 
(see (13.10.3)(c)). 

Suppose that (p) = PP. We inspect the chain of ideals B > PB > PPB = pB. 
The cancellation law shows that the inclusions are strict, and [B: pB] = p’. Therefore 
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[B: PB] = p. Similarly, [C: PC] = p (13.10.3)(b). The diagram below, together with the 
multiplicative property of the index (2.8.14), shows that [B:C] = [PB: PC]. 


B > aae 
U ne) 
roa Te oO 


13.7 IDEAL CLASSES 


As before, R denotes the ring of integers in an imaginary quadratic number field. We 
have seen that R is a principal ideal domain if and only if it is a unique factorization 
domain (13.5.6). We define an equivalence relation on nonzero ideals that is compatible with 
multiplication of ideals, and such that the principal ideals form one equivalence class. 


e Two nonzero ideals A and A’ of R are similar if, for some complex number A, 
(3-7-0) A’ =)A. 


Similarity of ideals is an equivalence relation whose geometric interpretation was mentioned 
before: A and A’ are similar if and only if, when regarded as lattices in the complex plane, they 
are similar geometric figures, by a similarity that is orientation-preserving. To see this, we 
note that a lattice looks the same at all of its points. So a geometric similarity can be assumed 
to relate the element 0 of A to the element 0 of A’. Then it will be described as a rotation 
followed by a stretching or shrinking, that is, as multiplication by a complex number A. 


e Similarity classes of ideals are called ideal classes. The class of an ideal A will be denoted 
by (A). 


Lemma 13.7.2 The class (R) of the unit ideal consists of the principal ideals. 


Proof. If (A) = (R), then A = AR for some complex number 2. Since 1 is in R, A is an 
element of A, and therefore an element of RX. Then A is the principal ideal (A). O 


We saw in (13.3.3) that there are two ideal classes in the ring R = Z [4], when 62 = -5. 
Both of the ideals A = (2,1 +6) and B = (3,1-+4) represent the class of nonprincipal 
ideals. They are shown below, in Figure 13.7.4. Rectangles have been put into the figure to 
help you visualize the fact that the two lattices are similar geometric figures. 

We see below (Theorem 13.7.10) that there are always finitely many ideal classes. The 
number of ideal classes in R is called the class number of R. 


Proposition 13.7.3 The ideal classes form an abelian group C, the class group of R, the law 
of composition being defined by multiplication of ideals: (A)(B) = (AB): 


(class of A)(class of B) = (class of A B). 
Proof. Suppose that (A) = (A’) and (B) = (B’), ie., A’ = XA and B’ = yB for some 


complex numbers A and y. Then A’ B’ = AyAB, and therefore (AB) = (A’B’). This shows 
that the law of composition is well defined. The law is commutative and associative because 
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multiplication of ideals is commutative and associative, and the class (R) of the unit ideal is 
an identity element that we denote by 1, as usual. The only group axiom that isn’t obvious 
is that every class (A) has an inverse. But this follows from the Main Lemma, which asserts 
that AA isa principal ideal (7). Since the class of a principal ideal is 1, (A)(A) = 1 and 
MA) = (Ay. C 


The class number is thought of as a way to quantify how badly unique factorization 
of elements fails. More precise information is given by the structure of C as a group. As we 
have seen, the class number of the ring R = Z[/-5] is two. The class group of R has order 
two. One consequence of this is that the product of any two nonprincipal ideals of R is a 
principal ideal. We saw several examples of this in (13.4.7). 


~~ ae awa eC aa eee. es 
a a x oe * 
x +k * x +k 
ee a — mag 
SF a . oe 2 S| 
(13.7.4) The Ideals A = (2,1 +5) and B = (3,1+85), & =-S. 


Measuring an Ideal 


The Main Lemma tells us that if A is a nonzero ideal, then AA = (n) is the principal 
ideal generated by a positive integer. That integer is defined to be the norm of A. It will be 
denoted by N(A): 


3.7.5) N(A) =n, ifn is the positive integer such that AA = (7). 


The norm of an ideal is analogous to the norm of an clement. As is true for norms of 
elements, this norm is multiplicative. 


Lemma 13.7.6 If A and B are nonzero ideals, then N(AB) = N(A)N(B). Moreover, the 
norm of the principal ideal (a) is equal to N(a), the norm of the element a. 


Proof. Say that N(A) = m and N(B) =n. This means that AA = (m) and BB = (n). 
Then (A B)(AB) = (AA)(BB) = (m)(n) = (mn). So N(AB) = mn. 
Next, suppose that A is the principal ideal (a), and let n = N(@) (= aa). Then 
= (@)(a) = (@a) = (n), so N(A) = 7 too. C) 
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We now have four ways to measure the size of an ideal A: 


e the norm N(A), 

e the index [R: A] of A in R, 

¢ the area A(A) of the parallelogram spanned by a lattice basis for A, 

¢ the minimum value taken on by the norm N(q), of the nonzero elements of A. 


The relations among these measures are given by Theorem 13.7.8 below. To state that 
theorem, we need a peculiar number: 


2/4 if d=2 or 3 (mod 4) 
(13.7.7) te ae 
(i ifd=1 (mod4). 


Theorem 13.7.8 Let R be the ring of integers in an imaginary quadratic number field, and 
let A be a nonzero ideal of R. Then 
A(A) 
N(A) = [R:A] = ——. 
(a) N(A) = [R:A] = Z (R) 
(b) If @ is a nonzero element of A of minimal norm, N(@) < N(A).w. 


The most important point about (b) is that the coefficient 42 doesn’t depend on the ideal. 


Proof. (a) We refer to Proposition 13.10.6 for the proof that [R: A] = Ree In outline, the 


proof that N(A) = [R: A] is as follows. Reference letters have been put over the equality 
symbols. Let n = N(A). Then 


n2 + [R:nR] 2 = [R:AA| 3 [R:A] [A:A4| + [R:A] [ R:A] > [R:AP. 


The equality labeled 1 is Lemma 13.10.3(b), the one labeled 2 is the Main Lemma, which 
says that 2 R = AA, and 3 is the multiplicative property of the index. The equality 4 follows 
from Proposition 13.6.4: [A : AA] =[RA: AA] =[R: A]. . Finally, the ring R is equal to 
its complex conjugate R, and 5 comes down to the fact that [R: A] = [R: A]. 


(b) When d=2, 3 modulo 4, R has the lattice basis (1, 6), and when d=1 modulo 4, R has 
the lattice basis (1, 7). The area A(R) of the parallelogram spanned by this basis is 


Jld| ifd=2o0r3 modulo 4 
13.7.9 A(R) See 
( ) ee ld| ifd=1 modulo 4. 

Soy we 2_ A(R). The length of the shortest vector in a lattice is estimated in Lemma 
T3AOB2eV(a) AltA). We substitute A(A) = Seve from part (a) into this 
inequality, obtaining N(a@) < N(A)w. O 
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Theorem 13.7.10 


(a) Every ideal class contains an ideal A with norm N(A) < w. 

(b) The class group C is generated by the classes of prime ideals P whose norms are prime 
integers p < LL. 

(c) The class group C is finite. 


Proof of Theorem 13.7.10. (a) Let A be an ideal. We must find an ideal C in the class (A) 
whose norm is at most jz. We choose a nonzero element @ in A, with N(w) < N(A). Then 
A contains the principal ideal (@), so A divides (a), i.e., (a) = AC for some ideal C, and 
N(A)N(C) = N(@) < N(A)w. Therefore N(C) < uw. Now since AC is a principal ideal, 
(C) = (A)! = (A). This shows that the class (A) contains an ideal, namely C, whose norm 
is at most jz. Then the class (A) contains C, and N(C) = N(C) < pe. 


(b) Every class contains an ideal A of norm N(A) < «u. We factor A into prime ideals: 
A = P,.-- Py. Then N(A) = N(P,)--- NCP,), so N(P;) < pw for each i. The classes of 
prime ideals with norm < y generate C. The norm of a prime ideal P is either a prime 
integer p or the square p* of a prime integer. If N(P) = p*, then P = (p) (13.6.1). This is 
a principal ideal, and its class is trivial. We may ignore those primes. 


(c) We show that there are finitely many ideals A with norm N(A) < uw. If we write such an 
ideal as a product of prime ideals, A = P,--- Py, and ifm; = N(P;), then m,---m,z < w. 
There are finitely many sets of integers m;, each a prime or the square of a prime, that satisfy 
this inequality, and there are at most two prime ideals with norms equal to a given integer 
m,. So there are finitely many sets of prime ideals such that N(P, --- Px) < pe. al 


13.8 COMPUTING THE CLASS GROUP 


The table below lists a few class groups. In the table, || denotes the floor of ju, the largest 
integer < yu. If n is an integer and ifn < pw, thenn < |p]. 


d Lu] class group 
-2 1 Ci 
-5 2 C2 
-7 1 Ci 
-14 4 Ca 
-21 S C2 X C2 
-23 2 C3 
-47 3 Cs 
-71 4 C7 
(13.8.1) Some Class Groups 


To apply Theorem 13.7.10, we examine the prime integers p < |]. If p splits (or 
ramifies) in R, we include the class of one of its two prime ideal factors in our set of 
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generators for the class group. The class of the other prime factor is its inverse. If p remains 
prime, its class is trivial and we discard it. 


als 13.8.2 d = -163. Since -163=1 modulo 4, the ring R of integers is Z[7], where 
i= (1 +5), and |u| = 7. We must inspect the primes p = 2, 3,5, and 7. If p splits, we 
include one of its prime divisors as a generator of the class group. ‘inate. to Theorem 
13.6.1, an integer prime p remains prime in R if and only if the polynomial xX x aie 
irreducible modulo p. This polynomial happens to be irreducible modulo each of the primes 
2, 3,5, and 7. So the class group is trivial, and R is a unique factorization domain. OC 


For the rest of this section, we consider cases in which d=2 or 3 modulo 4. In these 
cases, a prime p splits if and only if x? — d has a root in Fp. The table below tells us which 
primes need to be examined. 


(13.8.3) Primes Less Than 4, When d=2 or 3 Modulo 4 


If d = -1 or -2, there are no primes less than j2, so the class group is trivial, and R is a unique 
factorization domain. 

Let’s suppose that we have determined which of the primes that need to be examined 
split. Then we will have a set of generators for the class group. But to determine its structure 
we still need to determine the relations among these generators. It is best to analyze the 
prime 2 directly. 


Lemma 13.8.4 Suppose that d=2 or 3 modulo 4. The prime 2 ramifies in R. The prime 
divisor P of the principal ideal (2) is 

= (2,1+ 4), if d=3 modulo 4, 

= (2, 5), if d=2 modulo 4. 


The class (P) has order two in the class group unless d = -1 or -2. In those cases, P is a 
principal ideal. In all cases, the given generators form a lattice basis of the ideal P. 


Proof. Let P be as in the statement of the lemma. We compute the product ideal PP. If 
d=3 modulo 4, PP = (2, 1 — 8)(2,1+4) = (4, 2+ 26,2 —20, 1 — d), and if d= 2amedule 
4, PP = (2,-8)(2, 5) = (4, 26, -d). In both cases, PP = (2). Theorem 13.6.1 tells us that 
the ideal (2) is either a prime ideal or the product of a prime ideal and its conjugate, so P 
must be a prime ideal. 
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We note also that P = P, so 2 ramifies, (P) = (P)!, and (P) has order 1 or 2 in the 
class group. It will have order 1 if and only if it is a principal ideale This happens when d = -1 
or -2. If d =-1, P = (14+ 4), and if d = -2, P = (5). Whend < —2, the integer 2 has no 
proper factor in R, and then P is not a principal ideal. O 


Corollary 13.8.5 If d=2 or 3 modulo 4 and d < -2, the class number is even. S 


Example i 6 d = -26. Table 13.8 tells us to inspect the primes p = 2,3, and 5. The 
polynomial x? + 26 is reducible modulo 2, 3, and 5, so all of those primes split. Let’s say that 


(2) = PP, (3) = OQ, and (5)=SS 


We have three generators (P), (Q), (.S) for the class group, and (P) has order 2. How 
can we determine the other relations ae these generators? The secret method is to 
compute norms of a few elements, hoping to get some information. We don’t have to look 
far: N(1 + 5) = 27 = 3° and N(2 +4) = 30 =2:3-5. 

Let a = 1+. Then @a = 3°. Since (3) = OQ, we have the ideal relation 


(@)(a) =(QQ)’. 


Because ideals factor uniquely, the principal ideal (@) is the product of one half of the terms 
on the right, and (@) is the product of the conjugates of those terms. We note that 3 doesn’t 
divide a in R. Therefore QQ = (3) doesn’t divide (a). It follows that (a) is either Q? or 
(ou Which it is depends on which prime factor of (3) we label as Q. 

In either case, (Q)> = 1, and (Q) has order 1 or 3 in the class group. We check that 3 
has no proper divisor in R. Then since Q divides (3), it cannot be a principal ideal. So (Q) 
has order 3. 

Next, let 8 = 2+ 6. Then BB = 2-3-5, and this gives us the ideal relation 


(B)(B) = PPQQSS. 


Therefore the principal ideal (8) is the product of one half of the ideals on the right and ( B) 
is the product of the conjugates of those ideals. We know that P = P. If we don’t care which 
prime factors of (3) and (5) we label as Q and S, we may assume that (B) = PQS, This 
gives us the relation (P)(Q)(S) = 1. 

We have found three relations: 


(P? =1, (Q)° =1, and (P)(Q)(S) = 


These relations show that (Q) = (S)2, (P) = (S)°, and that (S) has order 6. The class group 
is a cyclic group of order 6, generated by a prime ideal divisor of 5. 


The next lemma explains why the method of computing norms works. 


Lemma 13.8.7 Let P, QO, S be prime ideals of the ring R of imaginary quadratic inte- 
gers, whose norms are the prime integers p,q, 5, respectively. Suppose that the relation 


402 . Chapter 13 Quadratic Number Fields 


(P)'(Q)4(S)* = 1 holds in the class group C. Then there is an element a in R with norm 
equal to p'q/s*. 


Proof. By definition, (P)'(Q)4(S)* = (PiQ/S*). If (P'Q/S*) = 1, the ideal P! Q/S* is 
principal, say P' QiSF = (a). Then 
(@)(a) = (PP) (QQ)/(SS)* = (p)' (q)i(9)* = (pg! s*). 
Therefore N(a) = a@a = pigis*. a 
We compute one more class group. 


Example 13.8.8 d = -74. The primes to inspect are 2, 3, 5, and 7. Here 2 ramifies, 3 and 5 
split, and 7 remains prime. Say that (2) = PP, (3) = QQ, and (5) = SS. Then (P), (Q), 
and (5S) generate the class group, and (P) has order 2 (13.8.4). We note that 


N(1 +4) 75 =3-52 
N(4+4+ 8) 90 =2.-32-5 
N(3 + 4) = 243 =3° 
N(14 + 6) = 270 =2-3?-5 


The norm N(13 + 5) shows that (Q)°? = 1, so (Q) has order 1 or 5. Since 3 has no 
proper divisor in R, Q isn’t a principal ideal. So (Q) has order 5. Next, N(1 + 6) shows 
that (S)? = (Q) or (Q), and therefore (5) has order 10. We eliminate (Q) from our = 3 
generators. Finally, N(4+5) gives us one of the relations (P)(Q)*(S) = 1 or (P)( Q)2(S 
Either one allows us to eliminate (P) from our list of generators. The class group is en f 
order 10, generated by a prime ideal divisor of (5). 


13.9 REAL QUADRATIC FIELDS 


We take a brief look at real quadratic number fields, fields of the form Q[/d], where d is a 
square-free positive integer, and we use the field Q[/2] as an example. The ring of integers 
in this field is a unique factorization domain: 


(13.9.1) R = Z[V2] = {a+ bV2 | a,b € Z}. 


It can be shown that unique factorization of ideals into prime ideals is true for the ring 
of integers in any real quadratic number field, and that the class number is finite [Cohn], 
[Hasse]. It is conjectured that there are infinitely many values of d for which the ring of 
integers has unique factorization. 


When d is positive, Q[Vd] is a subfield of the real numbers. Its ring of integers is not 
embedded as a lattice in the complex plane. However, we can repress R as a lattice in R? 
by associating to the algebraic integer a + bVd the point (u, v) of R?, where 


(13.9.2) u=a+bVd, v=a-—bVd. 


The resulting lattice is depicted below for the case d = 2. The reason that the hyperbolas 
have been put into the figure will be explained presently. 
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Recall that the field Q[/d] is isomorphic to the abstractly constructed field 
(13.9.3) F =QI[x|/G- — d). 


If we replace Q[/d] by F and denote the residue of x in F by 6, then 4 is an abstract square 
root of d rather than the positive real square root, and F is the set of elements a + bd, with 
a and b in Q. The coordinates u, v represent the two ways that the abstractly defined field 
F can be embedded into the real numbers, namely, u sends 5 ~~ Jd and v sends 5 ~» - Vd. 

For a=a+béd € Q[6d], we denote by a’ the “conjugate” element a — bd. The norm 
of @ is 


(13.9.4) N(q@) = oa = a* — bd. 
If a is an algebraic integer, then N(q) is an ordinary integer. The norm is multiplicative: 
(13.9.5) . N(a@B) = N(a) N(B). 


However, N(q) is not necessarily positive. It isn’t equal to |a|*. 


(13.9.6) - | The Lattice Z[V2]. 


One significant difference between real and imaginary quadratic fields is that the ring 
of integers in a real quadratic field always contains infinitely many units. Since the norm of 
an algebraic integer is an ordinary integer, a unit must have norm +1, and if M(a) = +1, 
then the inverse of a is +a’, so @ is a unit. For example, 


(13.9.7) a=14+V2, a =3+2V2, 08 =74+5v2,... 


are units in the ring R = Z[V2]. The element @ has infinite order in the group of units. 
The condition N(a@) = a* — 2b” = +1 for units translates in (u, v)—coordinates to 


(13.9.8) uv = +1. 
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So the units are the points of the lattice that lie on one of the two hyperbolas wv = 1 and 
uv =~-1, the ones depicted in Figure 13.9.6. It is remarkable that the ring of integers in a real 
quadratic field always has infinitely many units or, what amounts to the same thing, that the 
lattice always contains infinitely many points on these hyperbolas. This is far from obvious, 
either algebraically or geometrically, but a few such points are visible in the figure. 


Theorem 13.9.9 Let R be the ring of integers in a real quadratic number field. The group of 
units in FR is an infinite group. 


We have arranged the proof as a sequence of lemmas. The first one follows from 
Lemma 13.10.8 in the next section. 


Lemma 13.9.10 For every Ag > 0, there exists an r > 0 with the following property: Let L 
be a lattice in the (u, v)-plane P, let A(L) denote the area of the parallelogram spanned 
by a lattice basis, and suppose that A(L) < Ap. Then L contains a nonzero element y with 
es ae a CJ 


Let Ag and r be as above. For s > 0, we denote by Dy the elliptical disk in the (u, v) 
plane defined by the inequality s~*u? + s*v* < r?. So D, is the circular disk of radius r. The 


figure below shows three of the disks Dy. 


(139,11) Elliptical Disks that Contain Points of the Lattice. 


Lemma 13.9.12 With notation as above, let L be a lattice that contains no point on the 
coordinate axes except the origin, and such that A(L) < Ao. 


(a) For any s > 0, the elliptical disk D, contains a nonzero element of L. 


(b) For any point a = (u, v) in the disk D,, |uv| < o 


Proof. (a) The map 9: R* — R? defined by g(x, y) = (sx, s"!y) maps D, to Dy. The 
inverse image L' = 9 ''L of L contains no point on the axes except the origin. We note that 
¢ is an area-preserving map, because it multiplies one coordinate by s and the other bye. 
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Therefore A(L’) < Ap. Lemma 13.9.10 shows that the circular disk Dy, contains a nonzero 
element of L’, say y. Then aw = ~(y) is an element of L in the elliptical disk Dy. 


(b) The inequality is true for the circular disk D,;. Let y be the map defined above. If 
o = (u, v) isin Ds, then g (ex) = (s"1u, sv) isin Dj, so uv = |(s7!u)(sv)| < 5. Oo 
Lemma 13.9.13_ With the hypotheses of the previous lemma, the lattice L contains infinitely 
many points (uw, v) with |uv| < ee 


Proof. We apply the previous lemma. For large s, the disk D, is very narrow, and it contains 
a nonzero element of L, say as. The elements a; cannot lie on the e;-axis but they must 
become arbitrarily close to that axis as s tends to infinity. It follows that there are infinitely 


many points among them, and if as = (us, vs), then |usus| < = O 


Let R be the ring of integers in a real quadratic field, and let n be an integer. We call 
two elements 8; of R congruent modulo n if n divides 8, — 62 in R. When d=2 or 3 modulo 
4 and 6; = m; + n;6, this simply means that m,; =m and n; =n modulo n. The same is 
true when d=1 modulo 4, except that one has to write 6B; = m; +n;n. In all cases, there are 
n* congruence classes modulo n. 


Theorem 13.9.9 follows from the next lemma. 


Lemma 13.9.14 Let R be the ring of integers in a real quadratic number field. 


(a) There is a positive integer n such that the set S of elements of R with norm n is infinite. 
Moreover, there are infinitely many pairs of elements of S that are congruent modulo n. 

(b) If two elements 6; and f2 of R with norm 7 are congruent modulo n, then f2/f, is a 
unit of R. 


Proof. (a) The lattice R contains no point on the axes other than the origin, because u and 
v aren’t zero unless both a and D are zero. If a is an element of R whose image in the 
plane is the point (u, v), then |N(@)| = wv. Lemma 13.9.13 shows that R contains infinitely 
many points with norm in a bounded interval. Since there are finitely many integers n in that 
interval, the set of elements of R with norm n is infinite for at least one of them. The fact 
that there are finitely many congruence classes modulo n proves the second assertion. 


(b) We show that 62/; is in R. The same argument will show that 6; / 2 is in R, hence that 
B2/ fi is a unit. Since 6; and Bz are congruent, we can write 62 = B, + ny, with yin R. Let 
B;, be the conjugate of B). So BiB, =n. Then Bo/P; = (8) +ny)/Bi =1+ By. This is an 
element of R, as claimed. on O 


13.10 ABOUT LATTICES 


A lattice L in the plane R? is generated, or spanned by a set S if every element of L can 
be written as an integer combination of elements of S. Every lattice L has a lattice basis 
B = (v1, v2) consisting of two elements. An element of L is an integer combination of the 
lattice basis vectors in exactly one way (see (6.5.5)). 
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Some notation: 
(13.10.1) 


T1(B) : the parallelogram of linear combinations 7}v1 + r2v2 with O < 7; < 1. 
Its vertices are 0, vj, v2, and v; + v2. 
T1’(B) : the set of linear combinations r} vj + r2v2 with 0 < 7; < 1. It is obtained 
by deleting the edges [v;, vj + v2] and [v2, v1 + v2] from IT(B). 
A(L) : the area of T1(B). 
[M: L] : the index of a sublattice L of a lattice M — the number of additive cosets of L in M. 


We will see that A(L) is independent of the lattice basis, so that notation isn’t ambigu- 
ous. The other notation has been introduced before. For reference, we recall Lemma 6.5.8: 


Lemma 13.10.2 Let B = (v;, v2) be a basis of R*, and let L be the lattice of integer 
combinations of B. Every vector v in R? can be written uniquely in the form v = w + v9, 
with win L and vo in IT’(B). CO 


Lemma 13.10.3 Let K C L C M be lattices in the plane, and let B be a lattice basis for L. 
Then 


(a) [M:K] =(Afob [eK] 

(b) For any positive integer n, [L:nL] =n’. 

(c) For any positive real number r, [M:L] = [rM:rL]}. 

(d) [M:L)} is finite, and is equal to the number of points of M in the region IT’(B). 
(e) The lattice M is generated by L together with the finite set MM IT’(B). 


Proof. (d),(e) We can write an element x of M uniquely in the form v + y, where v is in L 
and y is in I1’(B). Then v is in M, and so y is in M too. Therefore x is in the coset y + L. 
This shows that the elements of M1 I1’(B) are representative elements for the cosets of L 
in M. Since there is only one way to write x = v + y, these cosets are distinct. Since M is 
discrete and II’(B) is a bounded set, M1 T1’(B) is finite. 


(13.10.4) L={-+} 3L = {x). 


| 
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Formula (a) is the multiplicative property of the index (2.8.14). (b) follows from (a), 
because the lattice nL is obtained by stretching L by the factor n, as is illustrated above for 


the case that n = 3. (c) is true because multiplication by r stretches both lattices by the same 
amount. 0 


Corollary 13.10.5 Let L C M be lattices in R*. There are finitely many lattices between L 
and M. 


Proof. Let B be a lattice basis for L, and let N be a lattice with L C N C M. Lemma 
13.10.3(e) shows that N is generated by L and by the set NM I1’(B), which is a subset of the 
finite set M1 T1’(B). A finite set has finitely many subsets. C 


Proposition 13.10.6 If L C M are lattices in the plane, [M:L] = ——. 


Proof. Say that C is the lattice basis (u;, u2) of M. Let n be a large positive integer, and let 
M,, denote the lattice with basis C,, = (du; . U2). Let I’ denote the small region I1’(C,,). 
Its area is 7A (M). The translates x + I’ of I’ with x in M,, cover the plane without 
overlap, and there is exactly one element of M,, in each translate x + I’, namely x. (This is 
Lemma 13.10.2.) 


Let B be a lattice basis for L. We approximate the area of I1(B) in the way 
that one approximates a double integral, using translates of T’’. Let r = [M: L]. Then 
[M,:L] =[Mn:M)[M:L] = n?r. Lemma 13.10.3(d) tells us that the region IT’(B) contains 
n’r points of the lattice M,. Since the translates of I’ cover the plane, the translates by 
these n2r points cover I1(B) approximately. 


A(L) =n?rA(M,) =rA(M) =[M:L]A(M). 


The error in this approximation comes from the fact that IT’(B) is not covered precisely 
along its boundary. One can bound this error in terms of the length of the boundary of I1(B) 
and the diameter of I’ (its largest linear dimension). The diameter tends to zero as n — ov, 
and so does the error. 0 


Corollary 13.10.7 The area A(L) of the parallelogram IT(B) is independent of the lattice 
basis B. 


This follows when one sets M = L in the previous proposition. O 
Lemma 13.10.8 Let v be a nonzero element of minimal length of a lattice L. Then 
lu? < ACL). 

The inequality becomes an equality for an equilateral triangular lattice. 


Proof. We choose an element v; of L of minimal length. Then v; generates the subgroup 
Lo 2, where £ is the line spanned by vj, and there is an element v2 such that (v1, v2) is a 
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lattice basis of L (see the proof of (6.5.5)). A change of scale changes |v; |? and A(L) by the 
same factor, so we may assume that |v,| = 1. We position coordinates so that vj = (dish 


Say that v2 = (b1, bz)’. We may assume that b> is positive. Then A(L) = bz. We may 


also adjust v2 by adding a multiple of v;, to make -5 <{bp= 5, so that bt es ie Since vj; 


has minimal length among nonzero elements of L, |u|? = bi + b2 > |v) |? = 1. Therefore 
b2 > 3. Thus A(L) = bp > B, and |v)? = 1 < S,A(L). oO 


Nullum vero dubium nobis esse videtur, 
quin multa eaque egregia in hoc genere adhuc lateant 
in quibus alii vires suas exercere possint. 


—Carl Friedrich Gauss 


EXERCISES 


Section1 Algebraic Integers 
1.1. Is }(1 + V5) an algebraic integer? 


1.2. Prove that the integers in Q[V/d] form a ring. 
1.3. (a) Let a be a complex number that is the root of a monic integer polynomial, not 
necessarily an irreducible polynomial. Prove that @ is an algebraic integer. 
(b) Let @ be an algebraic number that is the root of an integer polynomial f(x) = 
AnXx" + An_1x"—| + ---+ ag. Prove that a,q is an algebraic integer. 
(c) Let @ be an algebraic integer that is the root of a monic integer polynomial 


x” + dy, —1x"-1 +... +a x + apo. Prove that a! is an algebraic integer if and only 
i Gp) == sell. 


1.4, Let d and d’ be integers. When are the fields Q(/d) and Q(/d’) distinct? 


Section 2 Factoring Algebraic Integers 


2.1. Prove that 2, 3, and 14-5 are irreducible elements of the ring R = Z[V-5] and that the 
units of this ring are +1. 


2.2. For which negative integers d=2 modulo 4 is the ring of integers in Q[/d] a unique 
factorization domain? 


Section3 Ideals in Z[/—5] 


3.1. Let a be an element of R = Z[d], 6 = v-5, and let y = 5 (a + ad). Under what 
circumstances is the lattice with basis (a, y) an ideal? 


3.2. Let 5 = /-5S. Decide whether or not the lattice of integer combinations of the given 
vectors is anideal: (a) (5,1+5), (b) (7,1+ 8), (©) (4—265,2+4265, 6 + 46). 


Exercises 409 


Sor Let A be an ideal of the ring of integers R in an imaginary quadratic field. Prove that 
there is a lattice basis for A, one of whose elements is an ordinary positive integer. 


3.4. For each ring R listed below, use the method of Theorem 13.3.3 to describe the ideals in 
R. Make a drawing showing the possible shapes of the lattices in each case. 


(a) R = 2Z[V-3], (b) R= Z[4(1+ V-3)], (© R=Z[v-6], 
(d) R=Z[5(1+ V-7)], (&) R = Z[v-10] 


Section 4 Ideal Multiplication 

4A Let Re= Z{V-6]. Find a lattice basis for the product ideal AB, where A = (2,5) and 
Be 3,:0)% . 

4.2. Let R be the ring Z[5], where 5 = \/-5, and let A denote the ideal generated by the 
elements (a) 3+56,2+26, (b) 4+6, 1+26. Decide whether or not the given generators 
form a lattice basis for A, and identify the ideal AA. 

4.3. Let R be the ring Z[5], where 5 = J/-5, and let A and B be ideals of the form 
A = (a, $(@ +a65)), B= (B, s(B + B5)). Prove that A B is a principal ideal by finding a 
generator. 


Section 5 Factoring Ideals 
5.1,.het R= Zla/-5). 
(a) Decide whether or not 11 is an irreducible element of R and whether or not (11) isa 


prime ideal of R. 
(b) Factor the principal ideal (14) into prime ideals in Z[64]. 


5.2. Let 6 = V-3 and R = Z[6]. This is not the ring of integers in the imaginary quadratic 
number field Q[5]. Let A be the ideal (2, 1 + 8). 
(a) Prove that A is a maximal ideal, and identify the quotient ring R/A. 


(b) Prove that AA is not a principal ideal, and that the Main Lemma is not true for this 
ring. 
(c) Prove that A contains the principal ideal (2) but that A does not divide (2). 


5.3. Let f = y* — x? — x. Is the ring C[x, y]/(/) an integral domain? 


Section 6 Prime Ideals and Prime Integers 
6.1. Let d = -14. For each of the primes p = 2, 3,5, 7, 11, and 13, decide whether or not p 
splits or ramifies in R, and if so, find a lattice basis for a prime ideal factor of (p). 


6.2. Suppose that d is a negative integer, and that d=1 modulo 4. Analyze whether or not 2 
remains prime in R in terms of congruence modulo 8. 


6.3. Let R be the ring of integers in an imaginary quadratic field. 
(a) Suppose that an integer prime p remains prime in R. Prove that R/(p) isa field with 
p’ elements. 
(b) Prove that if p splits but does not ramify, then R/(p) is isomorphic to the product 
ring Fp X Fp. 
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6.4. When d is congruent 2 or 3 modulo 4, an integer prime p remains prime in the ring of 
integers of Q[V/d] if the polynomial x” — d is irreducible modulo p. 


(a) Prove that this is also true when d=1 modulo 4 and p#2. 
(b) What happens to p = 2 when d=1 modulo 4? 


6.5. Assume that d is congruent 2 or 3 modulo 4. 
(a) Prove that a prime integer p ramifies in R if and only if p = 2 or p divides d. 
(b) Let p be an integer prime that ramifies, and say that (p) = P?. Find an explicit lattice 
basis for P. In which cases is P a principal ideal? 

6.6. Let d be congruent to 2 or 3 modulo 4. An integer prime might be of the form a? — b’d, 
with a and b in Z. How is this related to the prime ideal factorization of (p) in the ring of 
integers R? 

6.7. Suppose that d=2 or 3 modulo 4, and that a prime p+ 2 does not remain prime in R. Let 
a be an integer such that a@=d modulo p. Prove that (p, a + 5) is a lattice basis for a 
prime ideal that divides (p).. 


Section 7 Ideal Classes 
7.1. Let R = Z[V-5], and let B = (3, 1 + 5). Find a generator for the principal ideal B?. 


7.2. Prove that two nonzero ideals A and A’ in the ring of integers in an imaginary quadratic 
field are similar if and only if there is a nonzero ideal C such that both AC and A’C are 
principal ideals. 


7.3. Let d = -26. With each of the following integers n, decide whether n is the norm of an 
element a of R. If it is, find a: n = 75, 250, 375, 5°. 


7.4. Let R = Z[6], where 5° = -6. 
(a) Prove that the lattices P = (2, 5) and Q = (3, 4) are prime ideals of R. 
(b) Factor the principal ideal (6) into prime ideals explicitly in R. 
(c) Determine the class group of R. 


Section 8 Computing the Class Group 
8.1. With reference to Example 13.8.6, since (P) = (S)> and (Q) = (S)?, Lemma 13.8.7 
predicts that there are elements whose norms are 2 - 5° and 32 - 5. Find such elements. 
8.2. With reference to Example 13.8.8, explain why N(4 + 6) and N(14 + 5) don’t lead to 
contradictory conclusions. 


8.3. Let R = Z[6], with 6 = /-29. In each case, compute the norm, explain what conclusions 
one can draw about ideals in R from the norm computation, and determine the class 
group of R: N(1+ 5), N(44+ 5), NS+5), N(Q +26), N(11 + 26). 

8.4. Prove that the values of d listed in Theorem 13.2.5 have unique factorization. 

8.5. Determine the class group and draw the possible shapes of the lattices in each case: 

(a) d= -10, (b) d=-13, (©) d=-14, (d) d=-21. 

8.6. Determine the class group in each case: 

(a) d = -41, (b) d = -57, (ce) d =-61, (d) d=-77, (e) d= -89. 
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Section9 Real Quadratic Fields 


9.1. Prove that 1 + /2 is an element of infinite order in the group of units of Z[V2]. 
9.2, Determine the solutions of the equation x* — yd = 1 when d is a positive integer. 
9.3. (a) Prove that the size function o(@) = |N(q)| makes the ring Z[/2] into a Euclidean 
domain, and that this ring has unique factorization. 
(b) Make a sketch showing the principal ideal (/2) of R = Z[/2], in the embedding 
depicted in Figure 13.9.6. 
9.4. Let R be the ring of integers in a real quadratic number field. What structures are possible 
for the group of units in R? 


9.5. Let R be the ring of integers in a real quadratic number field, and let Up denote the set 
of units of R that are in the first quadrant in the embedding (13.9.2). 


(a) Prove that Up is an infinite cyclic subgroup of the group of units. 
(b) Find a generator for Up when d = 3 and when d = 5. 
(c) Draw a figure showing the hyperbolas and the units in a reasonable size range for 


Section 10 About Lattices 


10.1. Let M be the integer lattice in R?, and let L be the lattice with basis ((2, 3)’, (3, 6)’). 
Determine the index [M: L]. 


10.2. Let L C M be lattices with bases B and C, respectively, and let A be the integer matrix 
such that B = CA. Prove that [M: L] = |det A|. 


Miscellaneous Problems 
M.1. Describe the subrings S of C that are lattices in the complex plane. 


*M.2. Let R = Z[6], where 5 = V-5S, and let p be a prime integer. 


(a) Prove thatif p splitsin R, say (p) = PP, then exactly one of the ellipses x*4+5y? =p 
or x* + 5y? = 2p contains an integer point. 
(b) Find a property that determines which ellipse has an integer point. 


M.3. Describe the prime ideals in (a) the polynomial ring C[x, y] in two variables, 
(b) the ring Z[x] of integer polynomials. 
M.4. Let L denote the integer lattice Z? in the plane R?, and let P be a polygon in the plane 


whose vertices are points of L. Pick’s Theorem asserts that the area A(P) is equal to 
a+b/2—1, where a is the number of points of L in the interior of P, and b is the number 


of points of L on the boundary of P. 


(a) Prove Pick’s Theorem. 
(b) Derive Proposition 13.10.6 from Pick’s Theorem. 
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Linear Algebra in a Ring 


Be wise! Generalize! 


—Picayune Sentinel! 


Solving linear equations is a basic problem of linear algebra. We consider systems AX = B 
when the entries of A and B are in a ring R here, and we ask for solutions X = (x1,..., Xn)’ 
with x; in R. This becomes difficult when the ring R is complicated, but we will see how it 
can be solved when R is the ring of integers or a polynomial ring over a field. 


14.1 MODULES 
The analog for a ring R of a vector space over a field is called a module. 


e Let R be aring. An R-module V is an abelian group with a law of composition written +, 
and a scalar multiplication R x V > V, written r, v ~ rv, that satisfy these axioms: 


(14.1.1) lv=v, (rs)v=r(sv), *1+s)v=ru+sv, and r(v+v’) =rvt+rv, 


for ally and sin R and all v and v’ in V. 


These are precisely the axioms for a vector space (3.3.1). However, the fact that elements of 
a ring needn’t be invertible makes modules more complicated. 

Our first examples are the modules R” of R-vectors, column vectors with entries in the 
ring. They are called free modules. The laws of composition for R-vectors are the same as 
for vectors with entries in a field: 


ay by a, +b; ay ray 
oa a : and r 


an bn an +bn an ran 


But when R isn’t a field, it is no longer true that they are the only modules. There will be 
modules that aren’t isomorphic to any free module, though they are spanned by a finite set. 


An abelian group V, its law of composition written additively, can be made into 
a module over the integers in exactly one way. The distributive law forces us to set 
2v = (1+1)v =v +4 v, and so on: 


nvu=v+-:--+vu= “n times v”’ 
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and (-n)v = -(nv), for any positive integer 7. It is intuitively plausible this makes V into a 
Z-module, and also that it is the only way to do so. Let’s not bother with a formal proof. 

Conversely, any Z-module has the structure of an abelian group, given by keeping only 
the addition law and forgetting about its scalar multiplication. 


(14.1.2) Abelian group and Z-module are equivalent concepts. 


We must use additive notation in the abelian group in order to make this correspondence 
seem natural, and we do so throughout the chapter. 

Abelian groups provide examples to show that modules over a ring needn’t be free. 
Since Z” is infinite when n is positive, no finite abelian group except the zero group is 
isomorphic to a free module. 


A submodule W of an R-module V is a nonempty subset that is closed under addition 
and scalar multiplication. The laws of composition on V make a submodule W into a module. 
We've seen submodules in one case before, namely submodules of the ring R, when it is 
thought of as the free R-module R’. 


Proposition 14.1.3 The submodules of the R-module R are the ideals of R. 


By definition, an ideal is a nonempty subset of R that is closed under addition and under 
multiplication by elements of R. a 


The definition of a homomorphism o:V — W of R-modules copies that of a linear 
transformation of vector spaces. It is a map compatible with the laws of composition: 
(14.1.4) _g(v+v)=(v)+¢(v’)) and g(rv)=r¢g(v), 
for all v and vw’ in V and rin R. An isomorphism is a bijective homomorphism. The kernel of 
a homomorphism y: V > W, the set of elements v in V such that g(v) = 0, is a submodule 
of the domain V, and the image of g is a submodule of the range W. 

One can extend the quotient construction to modules. Let W be a submodule of an 
R-module V. The quotient module V = V/W is the group of additive cosets v = [v + W]. 
It is made into an R-module by the rule 


(14.1.5) rd =7W. 


The main facts about quotient modules are collected together below. 


Theorem 14.1.6 Let W be a submodule of an R-module V. 

(a) The set V of additive cosets of Win V is an R-module, and the canonical map Zr: vo V 
sending v ~ 0 =[v+ W] isa surjective homomorphism of R-modules whose kernel is 
W. 

(b) Mapping property: Let f: V — V’ be a homomorphism of A- modules whose kernel K 
contains W. There is a unique homomorphism: f: V — V’ such that f = pr. 


f 


i cee 
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(c) First Isomorphism Theorem: Let f : V > V’ be a surjective homomorphism of 
R-modules whose kernel is equal to W. The map f defined in (b) is an isomorphism. 


(d) Correspondence Theorem: Let f:V — V be a surjective homomorphism of R-modules, 
with kernel W. There is a bijective correspondence between submodules of V and 
submodules of V that contain W. This correspondence is defined as follows: If S is 
a submodule of VY, the corresponding submodule of V is S = f— 1(S) and if S is a 
submodule of V that contains W, the corresponding submodule of W is S = f(S). If S 
and S are corresponding modules, then V/S is isomorphic to V/S. 


We have seen the analogous facts for rings and ideals, and for groups and normal subgroups. 
The proofs follow the pattern set previously, so we omit them. 


14.2 FREE MODULES 


Free modules form an important class, and we discuss them here. Beginning in Section 14.5, 
we look at other modules. 


e Let R be a ring. An R-matrix is a matrix whose entries are in R. An invertible R-matrix 
is an R-matrix that has an inverse that is also an R-matrix. The n Xn invertible R-matrices 
form a group called the general linear group over R: 


(14.2.1) GL,,(R) = {n Xn invertible R-matrices}. 


The determinant of an R-matrix A = (a;;) can be computed by any one of the rules 
described in Chapter 1. The complete expansion (1.6.4), for example, exhibits det A as a 
polynomial in the n? matrix entries, with coefficients +1. 


(1432) det A = ) ta), p1-+-@n,pn- 
P 
As before, the sum is over all permutations p of the indices {1,...,m}, and the symbol + 


stands for the sign of the permutation. When we evaluate this formula on an R-matrix, we 
obtain an element of R. Rules for the determinant, such as 


(det A)(det B) = det (AB), 


continue to hold. We have proved this rule when the matrix entries are in a field (1.4.10), 
and we discuss the reason that such properties are true for R-matrices in the next section. 
Let’s assume for now that they are true. 


Lemma 14.2.3 Let R be a ring, not the zero ring. 


(a) A square R-matrix A is invertible if and only if it has either a left inverse or a right 
inverse, and also if and only if its determinant is a unit of the ring. 


(b) An invertible R-matrix is square. 


Proof. (a) If A has a left inverse L, the equation (det L)(det A) = det = 1 shows that det A 


has an inverse in R, so it is a unit. Similar reasoning shows that det A is a unit if A has a right 
inverse. 


Section 14.2 Free Modules 415 


If A is an R-matrix whose determinant 6 is a unit, Cramer’s Rule: A7! = & !cof(A), 


where cof(A) is the cofactor matrix (1.6.7), shows that there is an inverse with coeffi- 
cients in R. 


(b) Suppose that an m Xn R-matrix P is invertible, i.e., that there is an n Xm R-matrix Q 
such that PQ = J, and also QP = /,,. Interchanging P and Q if necessary, we may suppose 
that m > n. If mn, we make P and Q square by adding zeros: 


Q 
i a0 || ae —| 2 
0 


This does not change the product PQ, but the determinants of these square matrices are 
zero, So they are not invertible. Therefore m =n. O 


When RK has few units, the fact that the determinant of an invertible matrix must be 
a unit is a strong restriction. For instance, if R is the ring of integers, the determinant must 
be +1. Most integer matrices are invertible when thought of as real matrices, so they are 
in GL, (R). But unless the determinant is +1, the entries of the inverse matrix won’t be 
integers: they won’t be elements of GL,(Z). Nevertheless, when n > 1, there are many 
invertible n Xn R-matrices. The elementary matrices E = I + ae;;, with i# j anda in R, 
are invertible, and they generate a large group. 


We return to the discussion of modules. The concepts of basis and independence 
(Section 3.4) are carried over from vector spaces. An ordered set (vj,..., vg) of ele- 
ments of a module V is said to generate V, or to span V if every element v is a linear 
combination: 


(14.2.4) = 7yVy ++++ + TKUE, 


with coefficients in R. If this is true, the elements v; are called generators. A module V is 
finitely generated if there exists a finite set of generators. Most of the modules we study will 
be finitely generated. 


A set of elements (v1,..., Un) of a module V is independent if, whenever a linear 
combination 7, v; + --:+/7nUp, with r; in R is zero, all of the coefficients 7; are zero. A set 
(v1,..., Un) that generates V and is independent is a basis. As with vector spaces, the set 
(v1, ..., Un) is a basis if every v in V is a linear combination (14.2.4) in a unique way. The 


standard basis E = (€1, ..., €x) is a basis of R”. 
We may also speak of linear combinations and independence of infinite sets, using the 
terminology of Section 3.7. Even when S is infinite, a linear combination can involve only 


finitely many terms. 
If we denote an ordered set (v1, ..., Un) of elements of V by B, as in Chapter 3. Then 


multiplication by B, 


x1 
Be (His..2.5 Un) : = Uj X4 +3: + UnNXn, 


Xn 
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defines a homomorphism of modules that we may also denote by B: 


(14.2.5) ROP V 


As before, the scalars have migrated to the right side. This homomorphism is surjective if 
and only if B generates V, injective if and only if B is independent, and bijective if and only 
if B is a basis. Thus a module V has a basis if and only if it is isomorphic to one of the free 
modules R*, and if so, it is called a free module too. A module is free if and only if it has a 
basis. 


Most modules have no basis. 


A free Z-module is also called a free abelian group. Lattices in R? are free abelian groups, 
while finite, nonzero abelian groups are not free. 

Computation with bases of free modules is done in the same way as with bases of vector 
spaces. If B is a basis of a free module V, the coordinate vector of an element v, with respect 
to B, is the unique column vector X such that v = BX. If two bases B = (v),... Um) and 
B’ = (v;...-. U,,) for the same free module V are given, the basechange matrix is obtained 
as in Chapter 3, by writing the clements of the new basis as linear combinations of the old 
basis: B’ = BP: 


Proposition 14.2.6 Let R be a ring that is not the zero ring. 


(a) The matrix P of a change of basis in a free module is an invertible R-matrix. 
(b) Any two bases of the same free module over R have the same cardinality. 


The proo! of (a) is the same as the proof of Proposition 3.5.9, and (b) follows from (a) 
and from Lemma 14.2.3. a 


The number of elements of a basis for a free module V is called the rank of V. The 
rank is analogous to the dimension of a vector space. (Many concepts have different names 
when used for modules over rings.) 

As is true for vector spaces, every homomorphism f between free modules R” and 
R”™ is given by left multiplication by an R-matrix A: 


(14.2.7) R" 4, R™, 
The jth column of A is f(e;). Similarly, if g@: V — W is a homomorphism of free 
R-modules with bases B = (v1, ..., Un) and C = (wy, ..., Wm), respectively, the matrix of 
the homomorphism with respect to B is defined to be A = (a; j). where 
(14.2.8) W;) =) wiarz 

i 


If X is the coordinate vector of a vector v, i.e., if v = BX then Y = AX is the coordinate 
vector of its image, i.e., p(v) = CY. 


(14.2.9) Rig = ORT ON ey 
J oak he a 
g ( q 


Vw v ~~ pv) 
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As is true for linear transformations, a change of the bases B and C by invertible R-matrices 
P and Q changes the matrix of g to A’ = OQ"! AP. 


14.3 IDENTITIES 


In this section we address the following question: Why do certain properties of matrices with 
entries in a field continue to hold when the entries are in a ring? Briefly, they continue to hold 
if they are identities, which means that they are true when the matrix entries are variables. 
To be specific, suppose that we want to prove a formula such as the multiplicative property 
of the determinant, (det. A)(det B) = det (AB), or Cramer’s Rule. Suppose we have already 
proved the formula for matrices with complex entries. We don’t want to do the work again, 
and besides. we may have used special properties of C, such as the field axioms, to check 
the formula there. We did use the properties of a field to prove the ones mentioned, so the 
proofs we gave will not work for rings. We show here how to deduce such formulas for all 
rings, once they have been shown for the complex numbers. 

The principle is quite general, but in order to focus attention, we consider the 
multiplicative property (det A)(det B) = det (AB), using the complete expansion (14.2.2) of 
the determinant as its definition. We replace the matrix entries by variables. Denoting by 
X and Y indeterminate m <n matrices, the variable identity is (det X)(det Y) = det (XY). 
Let’s write 


(14.3.1) f(X, Y) = (det X)(det Y) — det (XY). 


This is a polynomial in the 2n? variable matrix entries x;; and yzg, an element of the ring 
Z{x;i;}, {yee}] of integer polynomials in those variables. 

Given matrices A = (a;;) and B = (bye) with entries in a ring R, there is a unique 
homomorphism 


(14.3.2) gp: Z[ {xij}, {yee}] > R, 


the substitution homomorphism, that sends x; ; ~» aj; and yx¢ ~ bxe. 
Referring back to the definition of the determinant, we see that because @ is a 
homomorphism, it will send 


f(X, Y) » f(A, B) = (det A) (det B) — det (AB). 


To prove the multiplicative property for matrices in an arbitrary ring, it suffices to prove that 
f is the zero element in the polynomial ring Z[{x;;}, {Yxe}]. That is what it means to say that 
the formula is an identity. If so, then since p(0) = 0, it will follow that f(A, B) = 0 for any 
matrices A and B in any ring. 

Now: If we were to expand f and collect terms, to write it as a linear combination of 
monomials, all coefficients would be zero. However, we don’t know how to do this, nor do 
we want to. To illustrate this point, we look at the 2 x 2 case. In that case, 


F(X, Y) = (Cers22 — X12.X21) (11. ¥22 — Y12921)) 
— (x41 ¥11+%12.Y21) (X21 Y12 + X22¥22) 
+ (x11 912 + X12 922) (x21 11 + ©2222). 
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This is the zero polynomial, but it isn’t obvious that it is zero, and we wouldn’t want to make 
the computation for larger matrices. 

Instead, we reason as follows: Our polynomial determines a function on the space of 
2n? complex variables {x; j> Yee} by evaluation: If A and B are complex matrices and if we 
evaluate f at {a;;, be}, we obtain f(A, B) = (det A)(det B) — det (AB). We know that 
f(A, B) is equal to zero because our identity is true for complex matrices. So the function 
that f determines is identically zero. The only (formal) polynomial that defines the zero 
function is the zero polynomial. Therefore f is equal to zero. 


It is possible to formalize this discussion and to prove a general theorem about the 
validity of identities in an arbitrary ring. However, even mathematicians occasionally feel 
that formulating a general theorem isn’t worthwhile — that it is easier to consider each case 
as it comes along. This is one of those occasions. 


14.4 DIAGONALIZING INTEGER MATRICES 


We consider the problem mentioned at the beginning of the chapter: Given an m Xn integer 
matrix A (a matrix whose entries are integers) and a integer column vector B, find the integer 
solutions of the system of linear equations 


(14.4.1) AER. 


Left multiplication by the integer matrix A defines a map Z” “4, 7" Its kernel is the 
set of integer solutions of the homogeneous equation AX = 0, and its image is the set of 
integer vectors B such that the equation AX = B has a solution in integers. As usual, all 
solutions of the inhomogeneous equation AX = B can be obtained from a particular one by 
adding solutions of the homogeneous equation. 

When the coefficients are in a field, row reduction is often used to solve linear equations. 
These operations are more restricted here: We should use them only when they are given 
by invertible integer matrices — integer matrices that have integer matrices as their inverses. 
The invertible integer matrices form the integer general linear group GLy(Z) . 

The best results will be obtained when we use both row and column operations to 
simplify a matrix. So we allow these operations: 


(14.4.2) 


e add an integer multiple of one row to another, or add an integer multiple of one 
column to another; 

e interchange two rows or two columns; 

e multiply a row or column by ~1. 


Any such operation can be made by multiplying A on the left or right by an elementary 
integer matrix — an elementary matrix that is an invertible integer matrix. The result of a 
sequence of operations will have the form 


(14.4.3) A’ = Q"!AP, 


where Q and P are invertible integer matrices of the appropriate sizes. 
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Over a field, any matrix can be brought into the block form 


aan 


by row and column operations (4.2.10). We can’t hope for such a result when working with 
integers: We can’t do it for 1 x 1 matrices. But we can diagonalize. 
An example: 


Ye Ls row i ? 3 col il 0 0 
~—~14 6 61 oper {0 -2 -6| opers]0 -2 -6 
_ 1 0 0] row iF Osn0 col 1 0 0 et 
~ |0 -2 -6] oper | 0 2 6] oper }0 2 O| 


The matrix obtained has the form A’ = Q™!AP, where Q and P are invertible integer 
matrices: 


Joss 
(14.4.5) ia? el and r= 1 3] 


(14.4.4) 


1 


(It is easy to make a mistake when computing these matrices. To compute Q™!, the 
elementary matrices that produce the row operations multiply in reverse order, while to 
compute P one must multiply in the order that the operations are made.) 


Theorem 14.4.6 Let A be an integer matrix. There exist products Q and P of elementary 
integer matrices of appropriate sizes, so that A’ = Q”! AP is diagonal, say 


dy 


A’ = - 
dy 
0 


where the diagonal entries d; are positive, and each one divides the next: d, | dy | --- | dg. 


Note that the diagonal will not lead to the bottom right corner unless A is a square matrix, 
and if k is less than both m and n, the diagonal will have some zeros at the end. 

We can sum up the information inherent in the four matrices that appear in the theorem 
by the diagram 


(14.4.7) | gn “> 7m 


P| |o 
qn a Wik 


where the maps are labeled by the matrices that are used to define them. 
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Proof. We assume A#0. The strategy is to perform a sequence of operations, so as to end 
up with a matrix 


Bie Orme 
0 
(14.4.8) 


0 
in which d; divides every entry of M. When this is done, we work on M. We describe a 


systematic method, though it may not be the quickest way to proceed. The method is based 
on repeated division with remainder. 


Step 1: By permuting rows and columns, we move a nonzero entry with smallest absolute 
value to the upper left corner. We multiply the first row by -1 if necessary, so that this upper 
left entry aj; becomes positive. 

Next, we try to clear out the first column. Whenever an operation produces a nonzero 
entry in the matrix whose absolute value is smaller than a ;;, we go back to Step | and start 
the whole process over. This will spoil the work we have done, but progress is made because 
aj; decreases. We won’t need to return to Step 1 infinitely often. 


Step 2: If the first column contains a nonzero entry a;,; with i > 1, we divide by a): 
aij = 41g +7, 


where g and r are integers, and the remainder r is in the range 0 < r < a;,. We subtract 
q(row 1) from (row /). This changes a; to r. If r+0, we go back to Step 1. If r = 0, we have 
produced a zero in the first column. 


Finitely many repetitions of Steps 1 and 2 result in a matrix in which a;; = 0 for all 
i > 1. Similarly, we may use column operations to clear out the first row, eventually ending 
up with a matrix in which the only nonzero entry in the first row and the first column is aj. 


Step 3: Assume that a), 1s the only nonzero entry in the first row and column, but that some 
entry 6 of M is not divisible by a,;;. We add the column of A that contains b to column 1. 
This produces an entry b in the first column. We go back to Step 2. Division with remainder 
produces a smaller nonzero matrix entry. sending us back to Step 1. al 


We are now ready to solve the integer linear system AX = B. 


Proposition 14.4.9 Let A be anim Xn matrix, and let P and Q be invertible integer matrices 
such that A’ = QO"! AP has the diagonal form described in Theorem 14.4.6. 


(a) The integer solutions of the homogeneous equation A’X’ = 0 are the integer vectors X’ 
whose first k coordinates are zera. 


(b) The integer solutions of the homogeneous equation AX = 0 are those of the form 
xX = PX’, winere A‘X =0. 
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(c) The image W’ of multiplication by A’ consists of the integer combinations of the vectors 
d\ey, a005 dpe. 


(d) The image W of multiplication by A consists of the vectors Y = OY’, where Y’ is in W’. 
Proof. (a) Because A’ is diagonal, the equation A’X’ = 0 reads 
dix; = 0, dx = 0, eReneis AX), a—1 OH 


In order for X’ to solve the diagonal system A’X" = 0, we must have a = ONO), ee r, 
and x’, can be arbitrary if i > k, 


(c) The image of the map A’ is generated by the columns of A’, and because A’ is diager.:1. 
the columns are especially simple: A’ =dje;if j < k,and Ai seuOeit jameke 


(b).(d) We regard Q and P as matrices of changes of basis in Z” and Z’”", respectively. The 
vertical arrows in the diagram 14.4.7 are bijective, so P carries the kernel of A’ bijectively to 
the kernel of A, and Q carries the image of A’ bijectively to the image of A. i) 


We go back to example (14.4.4). Looking at the matrix A’ we see that the solutions 
of A’X’ = 0 are the integer multiples of e3. So the solutions of AX = 0 are the integer 
multiples of Pe3, which is the third column (3, -3, 1)' of P. The image of A’ consists of integer 
combinations of the vectors e; and 2e2, and the image of A is obtained by multiplying these 
vectors by Q. It happens in this example that O = Q7!. So the image consists of the integer 
combinations of the columns of the matrix 


ova[t Vf} o]=[h 3] 


Of course. the image of A is also the set of integer combinations of the columns of A, but 
those columns do not form a Z-basis. 

The solution we have found isn’t unique. A different sequence of row and column 
operations could produce different bases tor the kernel and image. But in our example, the 
kernel is spanned by one vector, so that vector is unique up to sign. 


Submodules of Free Modules 


The theorem on diagonalization of integer matrices can be used to describe homomorphisms 
between free abelian groups. 


Corollary 14.4.10 Let g: V > W be a homomorphism of free abelian groups. There 
exist bases of V and W such that the matrix of the homomorphism has the diagonal 
form (14.4.6). oO 


Theorem 14.4.11_ Let W be a free abelian group of rank m, and let U be a subgroup of W. 
Then U is a free abelian group, and its rank is less than or equal to 77. 


Proof. We begin by choosing a basis C = (],.... Uy) for W and a set of generators 
Beery. .4 un) for U. We write uw; = 5°; wjaj,. and we let A = (aj;). The columns oi 
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the matrix A are the coordinate vectors of the generators u ;, when computed with respect to 
the basis C of W. We obtain a commutative diagram of homomorphisms of abelian groups 


(14.4.12) | gr —A» gm 


where i denotes the inclusion of U into W. Because C is a basis, the right vertical arrow is 
bijective, and because B generates U, the left vertical arrow is surjective. 


We diagonalize A. With the usual notation A’ = OQ”! AP, we interpret P as the matrix 
of a change of basis for Z”, and Q as the matrix of a change of basis in Z”. Let the new bases 
be C’ and B’. Since our original choices of basis C and the generating set B were arbitrary, 
we may replace C, B and A by C’, B’ and A’ in the above diagram. So we may assume that 
the matrix A has the diagonal form given in (14.4.6). Then u; = djw; for j7=1,...,k. 


Roughly speaking, this is the proof, but there are still a few points to consider. First, 
the diagonal matrix A may contain columns of zeros. A column of zeros corresponds to a 
generator u ; whose coordinate vector with respect to the basis C of W is the zero vector. So 
u ; is zero too. This vector is useless as a generator, so we throw it out. When we have done 
this, all diagonal entries will be positive, and we will have kK =n andn < m. 


If W is the zero subgroup, we will end up throwing out all the generators. As with 
vector spaces, we must agree that the empty set is a basis for the zero module, or else 
mention this exceptional case in the statement of the theorem. 


We assume that the m Xn matrix A is diagonal, with positive diagonal entries 
d,,..., dn and with n < m, and we show that the set (uv), ..., u,) is a basis of U. Since this 
set generates U, what has to be proved is that it is independent. We write a linear relation 
ayu, +---+ anu, = 0in the form ajd,w, + ---+andnwy = 0. Since (wy,..., Wm) isa 
basis, ajd; = 0 for each i, and since d; > 0, a; = 0. 

The final point is more serious: We needed a finite set of generators of U to get started. 
How do we know that there is such a set? It is a fact that every subgroup of a finitely 
generated abelian group is finitely generated. We prove this in Section 14.6. For the moment, 
the theorem is proved only with the additional hypothesis that U is finitely generated. O 


Suppose that a lattice L in R* with basis B = (v1, v2) is a sublattice of the lattice M 
with the basis C = (u1, uz), and let A be the integer matrix such that B = CA. If we change 
bases in L and M, the matrix A will be changed to a matrix A’ = Q-! AP, where P and Q are 
invertible integer matrices. According to Theorem 14.4.6, bases can be chosen so that A is 
diagonal, with positive diagonal entries d, and dz. Suppose that this has been done. Then if 
B = (v1, v2) and C = (44, uz), the equation B = CA reads v; = dyuy and v2 = dou. 


12 ie2 


Let M be the integer lattice with its standard basis C = (e, e2), and let L be the lattice 
with basis B = (v1, v2) = ((2, 1)‘, (-1, 2)*). Its coordinate vectors are the columns of A. We 


Example 14.4.13 Leo=| 3 ea, |i 5 |= tap =| ik 
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interpret P as the matrix of a change of basis in L, and Q as the matrix of change of basis 
in M. In coordinate vector form, the new bases are C’ = (e1, e2)Q = ((1, 3), (0 1)') and 
B’ = (v1, 02)P = ((1, 3)', (0, 5)'). - om 

The left-hand figure below shows the squares spanned by the two original bases, 
and the figure on the right shows the parallelograms spanned by the two new bases. 
The parallelogram spanned by the new basis for L is filled precisely by five translates 
of the shaded parallelogram, which is the parallelogram spanned by the new basis for 
M. The index is 5. Note that there are five lattice points in the region I1’(v1, v2). This 
agrees with Proposition 13.10.3(d).The figure on the right also makes it clear that the ratio 
A(L)/A(M) is 5. O 


e e * e e . . * ° e * ° ° * 
* e . ° ° * . . * ° e * 

° e e * e . e ° 

e * ° . e e * . e * e ® * e 
e e ° * e . * ° e * 
* . ° ° * e e bd 

e * ° e ) ° ok ° ° * e . ° ° * e 

(14.4.14) Diagonalization, Applied to a Sublattice. 


14.5 GENERATORS AND RELATIONS 


In this section we turn our attention to modules that are not free. We show how to describe 
a large class of modules by means of matrices called presentation matrices. 
Left multiplication by an m Xn R-matrix A defines a homomorphism of R-modules 


R” “4, R™. Its image consists of all linear combinations of the columns of A with 


coefficients in the ring, and we may denote the image by AR”. We say that the quotient 
module V = R” /AR" is presented by the matrix A. More generally, we call any isomorphism 
o:R™ /AR"-» V apresentation of a module V, and we say that the matrix A is a presentation 
matrix for V if there is such an isomorphism. 

For example, the cyclic group Cs of order 5 is presented as a Z-module by the 1x1 
integer matrix [5], because Cs is isomorphic to Z/5Z. 

We use the canonical map 7: R” > V = R™/AR" (14.1.6) to interpret the quotient 
module V = R” /AR", as follows: 
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Proposition 14.5.1 

(a) V is generated by a set of elements B = (v,..., Um), the images of the standard basis 
elements of R”. 

(b) If Y = (yj,..., .¥m)' is a column vector in R”, the element BY = v; 1 +--- + Um Ym 


of V is zero if and only if Y is a linear combination of the columns of A, with coefficients 
in R - if and only if there exists a column vector X with entries in R such that Y = AX. 


Proof. The images of the standard basis elements generate V because the map 7 is 
surjective. Its kernel is the submodule AR”. This submodule consists precisely of the linear 
combinations of the columns of A. CI 


e If a module V is generated by a set B = (vj, ..., Um), we call an element Y of R” such 
that BY = 0 a relation vector, or simply a relation among the generators. We may also refer 
to the equation v1, +--:+ Um¥m = 0 as a relation, meaning that the left side yields 0 
when it is evaluated in V. A set S of relations is a complete set if every relation is a linear 
combination of S with coefficients in the ring. 


Example 14.5.2. The Z-module or an abelian group V that is generated by three elements 
Vj, V2, V3 With the complete set of relations 


3u, + 2% + wy = OO 
8, + 42 + 203 = O 
(14.5.3) Tv, + 6. + 203 = O 
9v, + 6U2 + v3 = O 
is presented by the matrix 
358 7 
(14.5.4) ee 2 eee 6 
2 2 
Its columns are the coefficients of the relations (14.5.3): 
(v1, V2, U3) A = (0, 0, 0, 0). oO 


We now describe a theoretical method of finding a presentation of an R-module V. 
The method is very simple: We choose a set of generators B = (v;,..., Um) for V. These 
generators provide us with a surjective homomorphism R’” — V that sends a column vector 
Y to the linear combination BY = v1, ¥; +--+ + Uy Vm. Let us denote the kernel of this map 
by W. It is the module of relations: its elements are the relation vectors. 

We repeat the procedure, choosing a set of generators C = (w),.... w,,) for W, and 
we use these generators to define a surjective map R” — W. But here the generators Wj 
are elements of R’”’. They are column vectors. We assemble the coordinate vectors A ; of Wj 
into an m Xn matrix 


| | 
(14.5.5) A = Aj do6 Ay 
| | 


Section 14.5 Generators and Relations 425 


Then multiplication by A defines a map 
Roe > R™ 
that sends e;~» A; = wj. It is the composition of the map R” > W with the inclusion 
WC R”. By construction, W is its image, and we denote it by AR”. 
Since the map RK” —> V is surjective, the First Isomorphism Theorem tells us that V is 


isomorphic to RK” /W = R™/AR". Therefore the module V is presented by the matrix A. 
Thus the presentation matrix A for a module V is determined by 


(14.5.6) 


e aset of generators for V, and 
¢ aset of generators for the module of relations W. 


Unless the set of generators forms a basis of V, in which case A is empty, the number of 
generators will be equal to the number of rows of A. 

This construction depends on two assumptions: We must assume that our module V 
has a finite set of generators. Fair enough: We can’t expect to describe a module that is too 
big, such as an infinite dimensional vector space, in this way. We must also assume that the 
module W of relations has a finite set of generators. This is a less desireable assumption 
because W is not given; it is an auxiliary module that was obtained in the course of the 
construction. We need to examine this point more closely, and we do this in the next section 
(see (14.6.5)). But except for this point. we can now speak of generators and relations for a 
finitely generated R-module V. 

Since the presentation matrix depends on the choices (14.5.6), many matrices present 
the same module, or isomorphic modules. Here are some rules for manipulating a matrix A 
without changing the isomorphism class of the module it presents: 


Proposition 14.5.7 Let A be an m Xn presentation matrix for a module V. The following 
matrices A’ present the same module V: 
(i) A’ = Q'A, with O in GL», (R); 
(ii) A’ = AP, with Pin GL, (R); 
(iii) A’ is obtained by deleting a column of zeros; 
(iv) the jth column of A is e;, and A’ is obtained from A by deleting (row /) and 
(column j) . 


The operations (iii) and (iv) can also be done in reverse. One can add a column of zeros, or 
one can add a new row and column with 1| as their common entry, all other entries being zero. 


Proof. We refer to the map R” +, R” defined by the matrix. 


(i) The change of A to Q"!A corresponds to a change of basis in R”. 
(ii) The change of A to AP corresponds to a change of basis in R". 
(iii) A column of zeros corresponds to the trivial relation, which can be omitted. 


(iv) A column of A equal to e; corresponds to the relation v; = 0. The zero element is 
useless as a generator, and its appearance in any other relation is irrelevant. So we may 
delete v; from the generating set and from the relations. Doing so changes the sea 
A by deleting the ith row and jth column. 
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It may be possible to simplify a matrix quite a lot by these rules. For instance, our 
original example of the integer matrix (14.5.4) reduces as follows: 


3 8 79 0 2.186 

af} 10 2]-[6 32 4] 1 gJ-[2 3 g- 
2. 2m Llp, Zee 
>[-4 -8]-[4 8]-[4 0]- f/f]. 


Thus A presents the abelian group Z/4Z. 

By definition, an m Xn matrix presents a module by means of m generators and n 
relations. But as we see from this example, the numbers m and n depend on choices; they 
are not uniquely determined by the module. 


Another example: The 2 X 1 matrix 5] presents an abelian group V by means of two 


0 
generators (vj, v2) and one relation 4v,; = 0. We can’t simplify this matrix. The abelian 
group that it presents is the direct sum Z/4Z ©® Z of a cyclic group of order four and an 
infinite cyclic group (see Section 14.7). On the other hand, as we saw above, the matri . 
[4 0] presents a group with one generator v, and two relations, the second of which is the 
trivial relation. It is a cyclic group of order 4. 


14.6 NOETHERIAN RINGS 


In this section we discuss finite generation of the module of relations. For modules over a 
nasty ring, the module of relations needn’t be finitely generated, though V is. Fortunately 
this doesn’t occur with the rings we have been studying, as we show here. 


Proposition 14.6.1 The following conditions on an R-module V are equivalent: 


(i) Every submodule of V is finitely generated; 
(ii) ascending chain condition: There is no infinite strictly increasing chain 
W, < W2 <.--- of submodules of V. 


Proof. Assume that V satisfies the ascending chain condition, and let W be a submodule of 
V. We select a set of generators of W in the following way: If W = 0, then W is generated by 
the empty set. If not, we start with a nonzero element w, of W, and we let W, be the span of 
(w 1). If W; = W we stop. If not, we choose an element w7 of W not in Wj, and we let W> 
be the span of (w1, w2). Then W, < W2. If W2 < W, we choose an element w3 not in Wp, 
etc. In this way we obtain a strictly increasing chain W; < W> <--- of submodules of W. 
Since V satisfies the ascending chain condition, this chain cannot be continued indefinitely. 
Therefore some W, is equal to W, and then (w ,..., wx) generates W. 


The proof of the converse is similar to the proof of Proposition 12.2.13, which states that 
factoring terminates in a domain if and only if it has no strictly increasing chain of principal 
ideals. Assume that every submodule of V is finitely generated, and let W, C WoC... 
be an infinite weakly increasing chain of submodules of V. We show that this chain is not 
strictly increasing. Let U denote the union of these submodules. Then U is a submodule of 
V. The proof is the same as the one given for ideals (12.2.15). So U is finitely generated. Let 
(u;,..., Uy) be aset of generators for U. Each u, is in one of the modules W; and since the 
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chain is increasing, there is an i such that W; contains all of the elements jos actu 
W; contains the module U generated by (u1,..., uz): UC W; C Wii, C U. This shows that 
U = W; = Wj; = U, and that the chain is not strictly increasing. O 


Definition 14.6.2 A ring R is noetherian if every ideal of R is finitely generated. 


Corollary 14.6.3 A ring is noetherian if and only if it satisfies the ascending chain condition: 
There is no infinite strictly increasing chain [; < J, <--- of ideals of R. O 


Principal ideal domains are noetherian because every ideal in such a ring is generated 
by one element. So the rings Z, Z[i], and F[x], with F a field, are noetherian. 


Corollary 14.6.4 Let R be a noetherian ring. Every proper ideal J of R is contained in a 
maximal ideal. 


Proof. If I is not maximal itself, then it is properly contained in a proper ideal Jy, and if Jp 
is not maximal, it is properly contained in a proper ideal J;, and so on. By the ascending 
chain condition (14.6.1), the chain J < J, < J3--- must be finite. Therefore J; is maximal for 
some k. O 


The relevance of the concept of a noetherian ring to the problem of finite generation of a 
submodule is shown by the following theorem: 


Theorem 14.6.5 Let R be a noetherian ring. Every submodule of a finitely generated 
R-module V is finitely generated. 


Proof. Case 1: V = R™. We use induction on m. A submodule of R! is an ideal of R 
(14.1.3). Since R is noetherian, the theorem is true when m = 1. Suppose that m > 1. We 


consider the projection 
mk Ro 

given by dropping the last entry: (a1, ...,@m) = (@1, --.,@m-1). Its kernel is the set of 
vectors of R™ whose first m — 1 coordinates are zero. Let W be a submodule of R”, and let 
gp: W -—> R™~' be the restriction of 2 to W. The image ¢(W) is a submodule of Re litiis 
finitely generated by induction. Also, kerg = (W \ kerz) is a submodule of ker 7, which 
is a module isomorphic to R!. So ker g is finitely generated. Lemma 14.6.6 shows that W is 
finitely generated. 


Case 2: The general case. Let V be a finitely generated R-module. Then there is a surjective 
map gy: R” —> V froma free module to V. Given a submodule W of V, the Correspondence 
Theorem tells us that U = g '(W) is a submodule of the module R”, so it is finitely 
generated, and W = (U). Therefore W is finitely generated (14.6.6)(a). O 


Lemma 14.6.6 Let g: V > V’ be a homomorphism of R-modules. 


(a) If V is finitely generated and ¢ is surjective, then V’ is finitely generated. 
(b) If the kernel and the image of g are finitely generated, then V is finitely generated. 
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(c) Let W be a submodule of an R-module V. If both W and V= V/W are finitely 
generated, then V is finitely generated. If V is finitely generated, so is V. 


Proof. (a) Suppose that g is surjective and let (v;,..., U,) be a set of generators for ¥,. 
The set (v,..., U,), with v; = g(v;), generates V’. 


(b) We follow the proof of the dimension formula for linear transformations (4.1.5). We 
choose a set of generators (u,,..., “,) for the kernel and a set of generators (v;,..., i) 
for the image. We also choose elements v; of V such that p(v;) = U., and we show that the 
set (Uj,..., Uzi Vj, ..-, Um) generates V. Let v be any element of V. Then ¢g(v) is a linear 
combination of (v},..-, Up), say P(v) = QU, +-+-+Amv,,. Let x = avy +--+ +GmUm. 
Then g(x) = y(v), hence v — x is in the kernel of g. So v — x is a linear combination of 
(Uj1,..., Uz), Say V—xX = bu, +---+ buy, and 


V = aQ,V] +++ +AmUm + byu, +--+ + Ogu. 


Since v was arbitrary, the set (u;,..., Ux; V1, ..., Um) generates. 


(c) This follows from (b) and (a) when we replace g by the canonical homomorphism 
w:V > V. CO 


This theorem completes the proof of Theorem 14.4.11. 

Since principal ideal domains are noetherian, submodules of finitely generated modules 
over these rings are finitely generated. In fact, most of the rings that we have been studying 
are noetherian. This follows from another of Hilbert’s theorems: 


Theorem 14.6.7 Hilbert Basis Theorem. Let R be a noetherian ring. The polynomial ring 
R[x] is noetherian. 


The proof of this theorem is below. It shows by induction that the polynomial ring 
R[x)...., Xn] in several variables over a noetherian ring R is noetherian. Therefore the 
rimeeeZ icy, 2.25 Xn] and F[x,,..., Xn], with F a field, are noetherian. Also, quotients of 
noetherian rings are noetherian: 


Proposition 14.6.8 Let R be a noetherian ring, and let / be an ideal of R. Any ring that is 
isomorphic to the quotient ring R = R/J is noetherian. 


Proof. Let J be an ideal of R, and let 2: R > R be the canonical map. Let J = 27!(J) be 
the corresponding ideal of R. Since R is noetherian, J is finitely generated, and it follows 
that J is finitely generated (14.6.6)(a). O 


Corollary 14.6.9 Let P be a polynomial ring in a finite number of variables over the integers 
or over a field. Any ring R that is isomorphic to a quotient ring P// is noetherian. O 


We turn to the proof of the Hilbert Basis Theorem now. 


Lemma 14.6.10 Let R be a ring and let / be an ideal of the polynomial ring R[x]. The set A 
whose elements are the leading coefficients of the nonzero polynomials in J, together with 
the zero element of R, is an ideal of R. the ideal of leading coefficients. 
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Proof. We must show that if @ and Bare in A, thena+fand rearealsoin A. If any one of the 
three elements a, B, or w+ Bis zero, then a+ Bis in A.so we may assume that thesc elements 
are not zero. Then @ is the leading coefficient of an element f of J, and f is the leading 
coefficient of an element g of /. We multiply f or g by a suitable power of x so that their 
degrees become equal. The polynomial we get is also in 7. Then w+ fis the leading coefficient 
of f+g. Since /is anideal, f+gisin J anda+ isin A. The proof that raisin A is similar. 


Proof of the Hilbert Basis Theorem. We suppose that R is a noetherian ring, and we let / 
be an ideal in the polynomial ring R[x]. We must show that there is a finite subset § of J 
that generates this ideal — a subset such that every element of / can be expressed as a linear 
combination of its elements, with polynomial coefficients. 


Let A be the ideal of leading coefficients of 7. Since R is noetherian, A has a finite set 
of generators, say (@,..., ax). We choose for each i = 1,...,k a polynomial f; in J with 
leading coefficient a;, and we multiply these polynomials by powers of x as necessary, so 
that their degrees become equal, say to n. 


Next, let P denote the set consisting of the polynomials in R[x] of degree less than 
n, together with 0. This is a free R-module with basis (1,.x,...,x” '). The subset P/O /, 
which consists of the polynomials of degree less than 7 that are in / together with zero, is an 
R-submodule of P. Let’s call this submodule W. Since P is a finitely generated R-module 
and since R is noetherian, W is a finitely generated RK-module. We choose generators 


@m,..-. h¢) for W. Every polynomial in J of degree Icss than x is a linear combination 0° 
(hy, ..., hg), with coefficients in R. 
We show now that the set (fi,..-, fx; 1,...,h¢) generates the ideal J. We 


induction on the degree d of g. 


Case 1: d <n. In this case, g is an element of W. so it is a lear combination of (A1,...,5 ° 
with coefficients in R. We don’t need polynomial coefficients here. 


Case 2: d > n. Let B be the leading coefficient of g, so g = Bx + (lower degree terms). 
Then # is an clement of the ideal A of Ieading coefficients, so it is a linear combination 
B = ra, +---+ rea, of the leading coefficients a; of /;, with coefficients in R. The 


polynomial 
q= se r; ed 6 


ip ivtre ideal penerated by (fi... fi). {t has degree d, and its leading coefficient is B. 
Therefore the degree of gq is less thand. By induction, g — gq is a polynomial combination 
of (fi,---, fe: 1,..-, Ae). Then g = q + (g — @) is also such a combination. . O 


14.7. STRUCTURE OF ABELIAN GROUPS 


The Structure Theorem for abelian groups, which is below, asserts that a finite abelian group 
V is a direct sum of cyclic groups. The work of the proof has been done. We know that there 
exists a diagonal presentation matrix for V. What remains to do is to interpret the meaning 
of this matrix for the group. 

The definition of a direct sum of modules is the same as that of a direct sum of vector 


spaces. 
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e Let W;,..., Wz be submodules of an R-module V. Their sum is the submodule that they 
generate. It consists of all elements that are sums: 

(14.7.1) Wit: -+Wr=(veV|vsuit---+ wz, with w; in Wi}. 

We say that V is the direct sum of the submodules Wj, ..., Wx, and we write 


V=W,9.---® W,, if 


(14.7.2) 
° they generate: V = W,+---+ Wx, and 
e they are independent: If w; +---+ wz = 0, with w; in Wj, then w; =0 for all. 


Thus V is the direct sum of the submodules W; if every element v in V can be written 
uniquely in the form v = w,+---+w,, with w; in W;. As is true for vector spaces, a module 
V is the direct sum W, © W> of two submodules W,; and W) if and only if W; + W2 = V 
and W; N W2 = 0 (see (3.6.6)). 


The same definitions are used for abelian groups. An abelian group V is the direct sum 
W, ®---® W, of the subgroups Wj, ..., Ws if: 
e Every element v of V can be written as asum v = w, +---+ w x with w; in Wj, Le., 
V=W,+---+ Wy. 
e Ifasum w; + ---+ wx, with w; in W; is zero, then w; = 0 for alli. 


Theorem 14.7.3 Structure Theorem for Abelian Groups. A finitely generated abelian group 
V is a direct sum of cyclic subgroups Cy,,..., Cg, and a free abelian group L: 


V=Cq,®--- Cg, OL, 


where the order d; of Cg, is greater than 1, and dj divides d;,; fori =1,...,k—1. 


Proof of the Structure Theorem. We choose a presentation matrix A for V, determined by 
a set of generators and a complete set of relations. We can do this because V is finitely 
generated and because Z is a Noetherian ring. After a suitable change of generators and 
relations, A will have the diagonal form given in Theorem 14.4.6. We may eliminate any 
diagonal entry that is equal to 1, and any column of zeros (see (14.5.7)). The matrix A will 
then have the shape 


dy 
(14.7.4) A= 
dk 
0 
with d; > 1 and dj|d)| -- - |dx. It will be an m Xk matrix, 0 < k < m. The meaning of this for 
our abelian group is that V is generated by a set of m elements B = (v1, ..., Um), and that 
(14.7.5) dyv1 = 0, coos AVE 0 


forms a complete set of relations among these generators. 
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Let C; denote the cyclic subgroup generated by v pMO hile AN HOr J sekoi ; 
is cyclic of order d;, and for j > k, C j is infinite cyclic. We show that V is the direct sum 
of these cyclic groups. Since B generates, V = C; + --- + Cm. Suppose given a relation 
W, +--+ + Wm = 0 with w; in Cj. Since vj generates Cj, wj = vj; yj; for some integer y;. 
The relation is BY = vy y; + --- + Umym = 0. Since the columns of A form a complete set 
of relations, Y = AX for some integer vector X, which means that y jis a multiple of d; if 
j <kand y; = Oif j > k. Since vjd; = Oif j < k, wj = 0 if j < k. The relation is trivial, 
so the cyclic groups Cj are independent. The direct sum of the infinite cyclic groups C ; with 
j > kis the free abelian group L. 


A finite abelian group is finitely generated, so as stated above, the Structure Theorem 
decomposes a finite abelian group into a direct sum of finite cyclic groups, in which the order 
of each summand divides the next. The free summand will be zero. 

It is sometimes convenient to decompose the cyclic groups further, into cyclic groups 
of prime power order. This decomposition is based on Proposition 2.11.3: If a and b are 
relatively prime integers, the cyclic group Cg, of order ab is isomorphic to the direct sum 
Ca © Cy» of cyclic subgroups of orders a and b. Combining this with the Structure Theorem 
yields the following: 


Corollary 14.7.6 Structure Theorem (Alternate Form). Every finite abelian group is a direct 
sum of cyclic groups of prime power orders. O 


It is also true that the orders of the cyclic subgroups that occur are uniquely determined 
by the group. If the order of V is a product of distinct primes, there is no problem. For 
example, if the order is 30, then V must be isomorphic to C2 ® C3 ® Cs and to C49. 
But is Cp ® Cy @ C4 isomorphic to C4 ® C4? It isn’t difficult to show that it is not, by 
counting elements of orders 1 or 2. The group C4 ® C4 contains four such elements, while 
Cz ® C2 ® C4 contains eight of them. This counting method always works. 


Theorem 14.7.7 Uniqueness for the Structure Theorem. Suppose that a finite abelian group 
V is a direct sum of cyclic groups of prime power orders d; = Be The integers d; are 
uniquely determined by the group V. 


Proof. Let p be one of the primes that appear in the direct sum decomposition of V, and let 
c; denote the number of cyclic groups of order p’ in the decomposition. The set of elements 
whose orders divide p’ is a subgroup of V whose order is a power of p, say p". Let k be the 
largest index such that c, > 0. Then 

Piece, + Cyue3 SPs + CE 

ly = C1 + 202 +203 +--+ + 2c, 

£3 = cy + 2c. +3c3 +--+ +3cx 


ly = Cy + 2c2 +303 +--+ + keg. 


The exponents £; determine the integers c;. O 
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The integers d; are also uniquely determined when they are chosen, as in Theorem 14.7.3, 
so that d;|--- |dg. 


14.8 APPLICATION TO LINEAR OPERATORS 


The classification of abelian groups has an analogue for the polynomial ring R = Fr] in one 
vatiable over a field F. Theorem 14.4.6 about diagonalizing integer matrices carries over 
because the key ingredient in the proof cf Theorem 14.4.6, the division algorithm. 1s available 
in F{t]. And since the polynomial ring is noethcrian, any finitely generated R-module V has 
a presentation matrix (14.2.7). 


Theorem 14.8.1) Let R = Ft] be a polynomial ring in one variable over a field F and let 
A be an mi Xn R-matrix. Phere are products Q and P of elementary R-matrices such that 
A’ = OT'AP is diagonal, each nonzero diagonal entry d; of A’ is a monic polynomial. and 


Gey Lass. |e : : C 
Example 14.8.2 Diagonalization of a matrix of polynomials: 


Pe a t—2.. | rowll S362 ek 
~ | @=-1) 2-324+2]| a 0 


col [ -l1 ¢t-2 col ell 8 row 1 0 
rP-t 0 eP-t P-3t+2t 0 P-34+2t]" 


Note: it is not surprising that we ended up with | in the upper left corner in this example. 
This will happen whenever the greatest common divisor of the matrix entries is 1. O 


As is true for the ring of integers, Theorem 14.8.1 provides us with a method to 
determine the polynomial solutions of a svstem AX = B, when the entries of A and B are 
polynomial matrices (see Proposition 14.4.9). 


We extend the structure theorem to polynomial rings next. To carry along the analogy 
with abelian groups, we define a cyclic R-module C, where R is any ring, to be a module 
that is generated by a single element v. Then there is a surjective homemorphism yg: R — C 
that sends ¢ ~~ rv. The kernel of yw. the module of relations. is a submodule of R, an ideal /. 
By the First Isomorphism Theorenr. © ts somorphic to the R-module 8/7. 

When K = F(t]. the ideal J wilh principal, and C will be isomorphic to R/(d) for 
some polynomial d. The inodule of relations will be generated by a single element. 


Theorem 14.8.3 Structure Theorem for Modules over Polynomial Rings. Let R = F[r] be 
the ring of polynomials in one variable with coefficients in a field F. 


(a) Let V be a finitely generated module over R. Then V is a direct sum of cyclic modules 
C).C2,.... Cx and a free module L. where C; is isomorphic to R/(d;), the elements 
d,..., dy are monic polynomials of positive degree, and d, | dy | ee | dx. 

(b) The same assertion as (a), except that the condition that d; divides dj, is replaced by: 
Each d; is a power of a monic irreducible polynomial. O 
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li is also true that the prime powers occurring in (bd) are unique, but we won’! taxe the time 
to prove this. 


Fer example, let R = R[r]. and the R-module V presented by the matrix A of Example 
14.8.2. It is also presented by the diagonal matrix 


1 0 
A= 
le | 


and we can drop the first row and column from this matrix (14.5.7). So V is presented by 
the | XI matrix [g], where g(t) = fF — 3r +21 = 1(t — 1)(t — 2). This means that V is a 
cyclic module, isomorphic to C = R/(g). Since g has three relatively prime factors, V can 
be further decomposed. It is isomorphic to a direct sum of cyclic R-modules: 


(14.8.4) R/(g) © (R/(t) @ (R/(t — 1) @ (R/(t—2)). 


We now apply the theory we have developed to study linear operators on vector spaces 
over a field. This application provides a good example of how abstraction can lead to new 
insights. The method developed for abelian groups is extended formally to modules over 
polynomial rings, and is then applied in a concrete new situation. This was not the historical 
development. The theories for abelian groups and for linear operators were developed 
independently and were tied together later. But it is striking that the two cases, abelian 
groups and linear operators, can end up looking so different when the same theory is applied 
to them. 

The key observation that allows us to proceed is that if we are given a linear operator 


(14.8.5) Ty ey 


on a vector space over a field F’, we can use this operator to make V into a module over the 
polynomial ring F[t]. To do so, we must define multiplication of a vector v by a polynomial 
Ff) =ant" +--+ +a;t+ ag. We set 


(14.8.6) f(t)v = an T"(v) + Gn_1T" | (v) +--+ +a, TV) + agu 


The right side could also be written as [/(7)](v), where f(7) denotes the linear operator 
AnT” + dyoyT"?7' +---+a,T + aol. (The brackets have been added to makeit clear that 
it is the operator /(7) that acts on v.) With this notation, we obtain the formulas 


(14.8.7) tvu=T(v) and f()v=[f(D](v). 


The fact that rule (14.8.6) makes V into an F[t]-module is easy to verify, and the formulas 
(14.8.7) may appear tautological. They raise the question of why we need a new symbol 1. 
But f(t) is a polynomial, while f(7) is a linear operator. 

Conversely, if V is an F[r]-module, scalar multiplication of elements of V by a 
polynomial is defined. In particular, we are given a rule for multiplying by the constant 
polynomials, the elements of F. If we keep the rule for multiplying by constants but forget 
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for the moment about multiplication by nonconstant polynomials, then the axioms for a 
module show that V becomes a vector space over F (14.1.1). Next, we can multiply elements 
of V by the polynomial t. Let us denote the operation of multiplication by ¢ on V as T. So T 
is the map 


(14.8.8) v—.V, defined by T(v) = 


This map is a linear operator when V is considered as a vector space over F’. By the 
distributive law, t(v + v’) = tu + tv’, therefore T(v + v’) = T(v) + T(v’). If c is a scalar, 
then tcv = ctv, and therefore T(cv) = cT(v). So an F[t|-module V provides us with a 
linear operator on a vector space. The rules we have described, going from linear operators 
to modules and back, are inverse operations. 


Linear operator on an F-vector space and 


(14.8.9) F{t]-module are equivalent concepts. 


We will want to apply this observation to finite-dimensional vector spaces, but we note 
in passing the linear operator that corresponds to the free F bi of rank 1. When F[t] 
is considered as a vector space over F,, the monomials (1, f, t?,...) form a basis, and we 
can use this basis to identify F[¢] with the infinite-dimensional spate Z, the space of infinite 
row vectors (ao, 41, 42, ...) with finitely many entries different from zero that was defined 
in (3.7.2). Multiplication by ¢ on F[t] corresponds to the shift operator T: 


(ao, a), A2,..- ) a (0, ag, Qi, A2,.. A 


The shift operator on the space Z corresponds to the free F{t]-module of rank 1. 

We now begin our application to linear operators. Given a linear operator T on a 
vector space V over F’, we may also view V as an F[t]-module. We suppose that V is 
finite-dimensional as a vector space, say of dimension n. Then it is finitely generated as a 
module, and it has a presentation matrix. There is some danger of confusion here, because 
there are two matrices around: the presentation matrix for the module V, and the matrix of 
the linear operator 7. The presentation matrix is an r X s matrix with polynomial entries, 
where r is the number of chosen generators for the module and s is the number of relations. 
The matrix of the linear operator is an n Xn matrix whose entries are scalars, where n is the 
dimension of V . Both matrices contain the information needed to describe the module and 
the linear operator. 

Regarding V as an F[t]-module, we can apply Theorem 14.8.3 to conclude that V is a 
direct sum of cyclic submodules, say 


V=W,@---® Wy, 


where W; is isomorphic to F[t]/( fi), fj being a monic polynomial in F[t]. When V is 
finite-dimensional, the free summand is zero. 

To interpret the meaning of the direct sum decomposition for the linear operator T, we 
choose bases B; for the subspaces W;. Then with respect to the basis B = (B;, ... , Bz), the 
matrix of T has a block form (4.4.4), where the blocks are the matrices of T restricted to the 


invariant subspaces Wj. Perhaps it will be enough to examine the operator that corresponds 
to a cyclic module. 
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Let W be a cyclic F[t]-module, generated as a module by a single element that we 
label as wo. Since every ideal of F[¢] is principal, W will be isomorphic to F[t]/(f), 
where f =f” + ay_t"~! + --- + ayt + ag is a monic polynomial in F[t]. The isomorphism 
F[t]/(f) > W will send 1~+ wo. The set (1, t,..., #*~1) is a basis of Fig.) (agse)yso 
the set (wo, two, t2wo, ...t?-! Wo) is a basis of W as vector space. 

The corresponding linear operator T: W > W is multiplication by ft. Written in terms 
of T, the basis of W is (wo, wi, ... Wn—1), with w; = T/wo. Then 


T(wo) = wi, Twi)=w2 ,..., T(wn_-2) = wn-1, and 
[f(D)]wo = T" wo + an_1T" wo +--+» +. a; Two + aqwo = 0. 
= TWy-1 + An-1Wy_-1 +++: +a, W1 +. aqwo = 0. 


This determines the matrix of T. It has the form illustrated below for small values of n: 
hee 0 0 -ao 
(14.8.10) [-a0]}.[4 : zl 10) =a; eae 


The characteristic polynomial of this matrix is f(¢). 


Theorem 14.8.11 Let T be a linear operator on a finite-dimensional vector space V over a 
field F. There is a basis for V with respect to which the matrix of T is made up of blocks of 
the type shown above. O 


This form for the matrix of a linear operator is called a rational canonical form. It is the best 
available for an arbitrary field. 


Example 14.8.12 Let F = R. The matrix A shown below is in rational canonical form. Its 
characteristic polynomial is f° — 1. Since this is a product of relatively prime polynomials: 
P—1=(t—1)( +141), the cyclic R[t]-module that it presents is a direct sum of cyclic 
modules. The matrix A’ is another rational canonical form that describes the same module. 


Over the complex numbers, A is diagonalizable. Its diagonal form is A”, where w = e77"/3, 
020.4 1 1 

(14.8.13) yi = ne pall) eae ‘ms le ee 
Oe 20 ees wo 


O 


Various relations between properties of an F[t]-module and the corresponding linear 
operator are summed up in the table below. 


(14.8.14) F[t|-module Linear operator T 
multiplication by ¢ operation of T 
free module of rank 1 shift operator 
submodule T-invariant subspace 
direct sum of submodules direct sum of 7-invariant subspaces 


cyclic module generated by w subspace spanned by w, PMT Ob)... 
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14.9 POLYNOMIAL RINGS IN SEVERAL VARIABLES 


Modules over a ring become increasingly complicated with increasing complication of the 
ring, and it can be difficult to determine whether or not an explicitly presented module is 
free. In this section we describe. without proof, a theorem that characterizes free modules 
over polynomial rings in several variables. This theorem was proved by Quillen and Suslin 
in 1976. 

leet = "Clay. 5 x;] be the polynomial ring in k variables, and let V be a finitely 
generated R-module. Let A be a presentation matrix for V. The entries of A will be 
polynomials a;;(x), and if A is an m Xn matrix, then V is isomorphic to the cokernel 
R™ /AR” of multiplication by A on R-vectors. 

When we evaluate the matrix entries a; (x) at a point (cy,..., cx) of Ck, we obtain a 
complex matrix A(c) whose i, j-entry is a; ;(Cc). 


Theorem 14.9.1 Let V bea finitely generated module over the polynomial ring C[x,, ..., xx]. 
and let A be an m Xn presentation matrix for V. Denote by A(c) the evaluation of A ata 
point c of Ck. Then V is a free module of rank r if and only if the matrix A(c) has rank m —r 
at every point c. 


The proof of this theorem requires too much background to give here. However, we can use 
it to determine whether or not a given module is free. For example, let V be the module 
over C[x, y] presented by the 4 X 2 matrix 


1 x 
(14.9.2) Ace i. 
x y 
x2 
So V has four generators, say v), ..., v4, and two relations: 


v1 + yu2 + xv3 +x*v,=0 and XV, + (%+3)v2 + yu3 + y-v4 aaah 


It isn’t very hard to show that A(c) has rank 2 for every point c in C*. Theorem 14.9.1 tells 
us that V is a free module of rank 2. 

One can get an intuitive understanding for this theorem by considering the vector 
space W(c) spanned by the columns of the matrix A(c). It is a subspace of C’". As c varies 
in the space C*, the matrix A(c) varies continuously. Therefore the subspace W(c) will also 
vary continuously, provided that its dimension does not jump around. Continuous families 
of vector spaces of constant dimension, parametrized by a topological space C*, are called 
vector bundles over CX. The module V is free if and only if the family of vector spaces W(c) 
forms a vector bundle. 


“Par une déformation coutumiére aux mathématiciens, 
je me’en tenais au point de vue trop restreint. 


—Jean-Louis Verdier 
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EXERCISES 


Section 1 Modules 
1.1. Let R be a ring, and let V denote the R-module R. Determine all homomorphisms 
g:V— V. 
d2webet Vibe an abelian group. Prove that if V has a structure of Q-module with its given law 
of composition as addition, then that structure is uniquely determined. 


iS: Let R = Z[a| be the ring generated over 7 by an algebraic integer a. Prove that for any 
integer m, R/m R is finite, and determine its order. 
1.4. A module is called simple if it is not the zero module and if it has no proper submodule. 
(a) Prove that any simple R-module is isomorphic to an A-module of the form R/M. 
where M is a maximal ideal. 


(b) Prove Schur's Lemma: Let gy: S > S" be a homomorphism of simple modules. Then 
¢g is either zero, or an isomorphism. 


Section2 Free Modules 


2.1. Let R = C[x. y], and let M be the ideal of R generated by the two elements x and y. Is 
M a free R-module? 


2.2. Prove that a ring R having the property that every finitely generated R-module is free is 
either a field or the zero ring. 


2.3. Let A be the matrix of a homomorphism ¢:Z” — Z”™ of free Z-modules. 


(a) Prove that g is injective if and only if the rank of A, as a real matrix, isn. 
(b) Prove that @ is surjective if and only if the greatest common divisor of the 
determinants of the m X m minors of A is 1. 


2.4. Let J be an ideal of a ring R. 
(a) Under what circumstances is / a free R-module? 
(b) Under what circumstances is the quotient ring R/J a free R-module? 


Section 3 Identities 

3.1. Let f denote the function on C” defined by evaluation of a (formal) complex polynomial 
f(x1,..., Xn). Prove that if f is the zero function, then f is the zero polynomial. 

3.2. It might be convenient to verify an identity only for the real numbers. Would this 
suffice? 

3.3. Let A and Bbem Xm andn Xn R-matrices. respectively. Use permanence of identities 
to prove that trace of the linear operator f(./) = AMB on the space R'"" is the product 
(trace A) (trace B). 


3.4. In each case. decide whether or not permanence of identities allows the result to be 
carried over from the complex numbers to an arbitrary commutative ring. 


(a) the associative law for matrix multiplication, 
(b) the Cayley-Hamilton Theorem, 


438° Chapter 14 Linear Algebra in a Ring 


(c) Cramer’s Rule, 

(d) the product rule, quotient rule, and chain rule for differentiation of polynomials, 
(e) the fact that a polynomial of degree n has at most 7 roots, 

(f) Taylor expansion of a polynomial. 


Section 4 Diagonalizing Integer Matrices 


4.1. (a) Reduce each matrix to diagonal form by integer row and column operations. 


EA a? ron 

=i. 2 2 4 6 -4 6 -2 

(b) For the first matrix, let V = Z2 and let L = AV. Draw the sublattice L, and find 
bases of V and L that exhibit the diagonalization. 


(c) Determine integer matrices Q™' and P that diagonalize the second matrix. 
4,2. Let d,, dy, ... be the integers referred to in Theorem 14.4.6. Prove that d; is the greatest 
common divisor of the entries a; ; of A. 


4.3. Determine all integer solutions to the system of equations AX = 0, when 


Ag= le } a Find a basis for the space of integer column vectors B such that AX = B 


has a solution. 
4.4, Find a basis for the Z-module of integer solutions of the system of equations 
x+2y+3z=0,x+4y4+9z =0. 


4.5. Let a, B, y be complex numbers. Under what conditions is the set of integer linear 
combinations {Ca +mB+ny | £,m,n € Z} a lattice in the complex plane? 


4.6. Let g: Z‘ + Z* be a homomorphism given by multiplication by an integer matrix A. 
Show that the image of ¢ is of finite index if and only if A is nonsingular and that if so, 
then the index is equal to |det A]. 


4.7, Let A = (aj,...,@,)' be an integer column vector, and let d be the greatest common 
divisor of a,,...,@,. Prove that there is a matrix P € GL,(Z) such that PA = 
my... ,0)'. 


4.8. Use invertible row and column operations in the ring Z[i] of Gauss integers to diagonalize 


: Di DE 
the matrix E ~i 9 | 
. panes A(L) 
4.9. Use diagonalization to prove that if L C M are lattices, then [M: L] = ACM)’ 


Section5 Generators and Relations 


5.1. Let R = Z[6], where 5 = /-5S. Determine a presentation matrix as R-module for the 
ideal (2, 1+). 


bai 2 
5.2. Identify the abelian group presented by the matrix] 1 1 1.|. 


a 3 6 
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Section6 Noetherian Rings 


el let V GC" be the locus of common zeros of an infinite set of polynomials fj, fo, f3,..-- 
Prove that there is a finite subset of these polynomials whose zeros define the same locus. 


6.2. Find an example of a ring R and an ideal J of R that is not finitely generated. 


Section 7 Structure of Abelian Groups 


7.1. Find a direct sum of cyclic groups isomorphic to the abelian group presented by the matrix 
aie hae 
22 a) 
patie?) 


7.2. Write the abelian group generated by x and y, with the relation 3x + 4y = 0 as a direct 
sum of cyclic groups. 


7.3. Find an isomorphic direct product of cyclic groups, when V is the abelian group generated 
by x, y, Z, with the given relations. 
(a) 3x+2y+8z =0,2x+4z=0 
(b) x+ y=0,2x =0, 4x4+2z = 0, 4x +2y+2z= 
(c) 2x+ y=0,x- y+3z=0 
(d) 7x+5y+2z =0,3x+3y =0, 13x + 11ly+2z=0 
7.4. In each case, identify the abelian group that has the given presentation matrix: 


1 0 
2 0 ZS 2 4 Zeit 4 6 
EF} (S}-2 oe 8). SLL ss. 5) 
0 0 
7.5. Determine the number of isomorphism classes of abelian groups of order 400. 


7.6. (a) Let aand b be relatively prime positive integers. By manipulating the diagonal matrix 
with diagonal entries a and b, prove that the cyclic group Cg» is isomorphic to the 
product Cg ® Cp. 


(b) What can you say if the assumption that a and D are relatively prime is dropped? 


7.7. Let R = Z[i] and let V be the R-module generated by elements v, and v2 with relations 
(1 + i)vy + (2 — i)v2 = 0, 3v, + Siva = 0. Write this module as a direct sum of cyclic 
modules. 

7.8. Let F = F p. For which prime integers p does the additive group F ' have a structure of 
Z{i]-module? How about F?? 


7.9. Show that the following concepts are equivalent: 


e R-module, where R = Z[i], 
e abelian group V, with ahomomorphism g: V — V such that go g = —identity. 


Section 8 Application to Linear Operators 


8.1. Let T be the linear operator on C* whose matrix is F il Is the corresponding 


C[t]-module cyclic? 
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8.2. Let M be a “léj-module th. ‘arta C[7]/(r a)”. Show that there is a C-basis for AZ, such 
that the matrix of the corresponding linear operator is a Jordan block. 
8.3. Let R = F [x] pe the polynennal ving in one variable over a field F. and let V be the 


R-module gencrated by an clement v that satisfies the relation (¢° + 362) uv = 0. Ghewse 
a basis for V as #’-vector space, and determine the matrix of the operator of multiplication 
by ¢t with respect to this basis. 


8.4. Let V be an F[f|-module, and Jet B= (vj;,.... v,) be a basis for V as F-vector space. 
Let B be the matrix of 7 with respect to this basis. Prove that A = ¢/ — Bis a presentation 
matrix for the module. 


8.5. Prove that the characteristic polynomial of the matrix (14.8.10) is (2). 
8.6. Classify finitely generated modules over the ring C[e], where €? = 0. 


Section 9 Polynomial Rings in Several Variables 


9.1. Determine whether or not the modules over C[x, vy] presented by the following matrices 


are free. 
x? +1 b eae am ne 
@| aces Sal oni ae oh 
x 2y 


9.2. Prove that the module presented by (14.9.2) is free by exhibiting a basis. 
9.3. Following the model of the polynomial ring in one variable, describe modules over the 
ring C[x, y] in terms of complex vector spaces with additional structure. 


9.4. Prove the easy half of the theorem of Quillen and Suslin: If V is free, then the rank of 
A(c) is constant. 


ee 
that the residue of A in R/P has rank ] for every prime ideal P of R, but that V is not a 
free module. 


9.5. Let R = Z{/-5]. and let V be the module presented by the matrix A = é |, Prove 


Miscellaneous Problems 


M.1. In how many ways can the additive group Z/5Z be given the structure of a module over 
the Gauss integers? 


M.2. Classify finitely generated modules over the ring Z/(6). 


M.3. Let A be a finite abelian group, and let g: A + C% be a homomorphism that is not the 
trivial homomorphism. Prove that }°., p(a) = 0. 


M.4. When an integer 2 X 2 matrix A is diagonalized by O7! AP, how unique are the matrices 
P and Q? 


M.5. Which matrices A in GL2(R) stabilize some lattice L in R2? 
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M.6. (a) Describe the orbits of right multiplication by G = GL2(Z) on the space of 2 x 2 
integer matrices. 


(b) Show that for any integer matrix A, there is an invertible integer matrix P such that 
A P has the following Hermitian normal form: 


ad) 0 “ONO 
an ad 0 O 


a3 b3 dz 0 > 


where the entries are nonnegative, a2 < do, a3, b3 < d3, etc. 


M.7. Let S be a subring of the polynomial sing R = C[r] that contains C and is not equal to C. 
Prove that R is a finitely generated S-module. 
*M.8. (a) Let aw be a complex number, and let Z/a] be the subring of C generated by a. Prove 
that a is an algebraic integer if and only if Z[a] is a finitely generated abelian group. 
(b) Prove that if aw and # are algebraic integers, then the subring Z[a, B| of C that they 
generate is a finitely generated abelian group. 
(c) Prove that the algebraic integers form a subring of C. 
*M.9. Consider the Euclidean space R*, with dot product (v- w). A lattice L in V is a 


discrete subgroup of V™ that contains k independent vectors. If LZ is a lattice, define 
L* = {w | (v- w) € Z forall v € L}. 


(a) Show that L has a lattice basis B = (11, ..., vz), a Set of k vectors that spans L as 
Z-module. 

(b) Show that L* is a lattice, and describe how one can determine a lattice basis for L* 
in terms of B. 

(c) Under what conditions is L a sublattice of L*? 

(d) Suppose that L C L*. Find a formula for the index [L*: L]. 


*M.10. (a) Prove that the multiplicative group Q* of rational numbers is isomorphic to the 
direct sum of a cyclic group of order 2 and a free abelian group with countably many 
generators. 

(b) Prove that the additive group Q* of rational numbers is not a direct sum of two 
proper subgroups. 
(c) Prove that the quotient group Q*/Z* is not a direct sum of cyclic groups. 
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Fields 


Our difficulty is not in the proofs, but in learning what to prove. 


—Emil Artin 


15.1 EXAMPLES OF FIELDS 


Much of the theory of fields has to do with a pair F C K of fields, one contained in the other. 
Given such a pair, K is called a field extension of F, or an extension field. The notation K/F 
will indicate that K is a field extension of F. 

Here are the three most important classes of fields. 


Number Fields 


A number field K is a subfield of C. 


Any subfield of C contains the field Q of rational numbers, so it is a field extension of Q. The 
number fields most commonly studied are algebraic number fields, all of whose elements are 
algebraic numbers. We studied quadratic number fields in Chapter 13. 


Finite Fields 

A finite field is a field that contains finitely many elements. 

A finite field contains one of the prime fields F ,, and therefore it is an extension of that field. 
Finite fields are described in Section 15.7. 

Function Fields 

Extensions of the field F = C(t) of rational functions are called function fields. 


A function field can be defined by an equation f(t, x) = 0, where / is an irreducible complex 
polynomial in the variables t and x, such as f(t, x) = x? — P41, for example. We may use 
the equation f(t, x) = 0 to define x “implicitly” as a function x(t) of t, as we learn to do in 
calculus. In our example, this function is x(t) = / f° — t. The corresponding function field 
K consists of the combinations p + gv f° — t, where p and q are rational functions in t. One 
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can work in this field just as one would in a field such as Q(/-5). For most polynomials 
W(t, x), there won't be an explicit expression for the implicitly defined function x(t), but 
by definition, it satisfies the equation f(t, x(t)) = 0. We will see in Section 15.9 that x(#) 
defines an extension field of F. 


15.2 ALGEBRAIC AND TRANSCENDENTAL ELEMENTS 


Let K ' be an extension of a field F, and let aw be an element of K. By analogy with the 
definition of algebraic numbers (11.1), @ is algebraic over F if it is a root of a monic 
polynomial with coefficients in F, say a | 


(15.2.1) FOV =2 Gn 4x"! + --- 4 ag, witha; in F, 


and f(a) = 0. An element is transcendental over F if it is not algebraic over F — if it is not a 
root of any such polynomial. — 

These properties, algebraic and transcendental, depend on F. The complex number 
271s algebraic.over the field of real numbers but transcendental over the field of rational. 
numbers. Every element @ of a field K is algebraic over K, because it is the root of the 
polynomial x — a@, which has coefficients in K. 

The two possibilities for a can be described in terms of the substitution homomorphism 


(15.2.2) gy: F[x] > K, defined by x~ a. 


An element q is transcendental over F if @ is injective, and algebraic over F if @ is not 
injective, that is, if the kernel of g is not zero. We won’t have much to say about the case 
that a is transcendental. 

Suppose that @ is algebraic over F. Since F[x] is a principal ideal domain, the kernel 
of ¢ is a principal ideal, generated by a monic polynomial f(x) with coefficients in F’. This 
polynomial can be described in various ways. 


Proposition 15.2.3 Let a be an element of an extension field ai of a field F that is 
algebraic over F. The following conditions on a monic polynomial f with coefficients in 
F are equivalent. The unique monic polynomial that satisfies these conditions is called the 


irreducible polynomial for a over F . 


¢ f is the monic polynomial of lowest degree in F[x] that has @ as a root. 

e f is an irreducible element of F[x], and @ is a root of f. 

¢ f has coefficients in F, a is a root of f, and the principal ideal of F[x] that is 
generated by f is a maximal ideal. 

e aw is a root of f, and if g is any polynomial in F[x] that has q as a root, then f 
divides g. . 0 


The degree of the irreducible polynomial for a over F is called the degree of a over F. 
é ee 


It is important to keep in mind that the irreducible polynomial f depends on F as 


well as on a, because irreducibility of a polynomial depends on the field. The irreducible 
pa 
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polynomial for /7 over Q is Sicilia tit _ polynomial factors in the field Q(/). The 
irreducible polynomial for Ji aver Qi) is x2.—i, When there are several fields around, it is 

ambiguous to say that a polynomial is irreducible. It is better to say that f is irreducible over 
F, or that it is an irreducible element of F(x]. 


Let K be an extension field of F. The subfield of K generated by an clement @ of K 
will be denoted by F(a): 


(15.2.4) F(a) is the smallest subfield of K that contains F and aq. 


Similarly, ifa@,, .... a, are elements of an extension field K of F, the notation F(a), ..., &&) 
will stand for the smallest subfield of K that contains these elements and F. . 

As in Chapter 11, we denote the ring generated bv a over F by Flag]. Lt is the image 
of the map g: F[x] + K defined above. and%t consists of the elements f of K that can be 
expressed as polynomials in @ with coefficients in F: 


52-5) B=bna"’+---+bha+bo, db: inF. 


The field_/'(@) is isomorphic to the field of fractions of Flor]. Its elements are ratios 0° 
elements of the form (15.2.5) (see Section 11.7). 

Similarly, if a), .... Op are elements of K. the smallest subring of K that contains F 
and these elements is denoted by Fla ;,..., a]. It consists of the elements “i ot K that can 
be expressed as polynomials in the Ot; = coefficients 1 in F. The fi@la FC jen.) SO) tse 
field of fractions of the ring F [a;,..., a]. 7 

Ifanelement @ of F O@femseaaitana over F, the map F [x] > F[a]is an isomorphism 
In that case F(@) is isomorphic to the field F(x) of rational functions. The field extensions 
F(a) are isomorphic for all transcendental elements a. 

Things are different when a is algebraic: 


Proposition 15.2.6 Let a be an element of an extension field K /F which is algebraic over 
F,, and let f be the irreducible polynomial for a over F. 


(a) The canonical map F[x]/(f) > F[a] is an isomorphism, and F[qa] is a field. Thus 
Flea). 

(b) More generally. let a},...,a, be elements of an extension field K/F. which are 
algebraic over F. The ring F[a,, ..., ax] is equal to the field F(ay,..., a). 


Proof. (a) Let @: F[x] — K be the map (15.2.2). Since the ideal (f) is maximal, {(x) 
generates the kernel, and F[x]/(f) is isomorphic to the image of gy, which is Fla]. 
Moreover, F[x]/(f ) is a field, and therefore Fla] is a field. Since F(a) i is the fraction field 
of F[a], it is equal to Fla]. 


(b) This follows by induction: 
Floy,..., x] = Fai, ...,@-1| [og] = Fj, pee) ie ere 
The next proposition is a special case of Proposition 11.5.5. 
Proposition 15.2.7 Let @ be an algebraic element over F, and let f@ be the irreducible 


—_— for a over F. If f(x) has degree n, ie., if a has degree n over F, then 
(1,a,...,a@"~') is a basis for F(a) as a vector space over F’, ° (fe 
LS 
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For instance, the irreducible polynomial for @ = e°'/3 over Q is x? + x + 1. The degree of 
@ over Q is 2, and (1, ) is a basis for Q(w) over Q. ait lalitaiatins 


It may not be easy to tell whether two algebraic elements @ and £ generate isomorphic 
field extensions, though Proposition 15.2.7 provides a necessary condition: They must have 
the same degree over F, because the degree of a over F is the dimension of F(a) as an 
F-vector space. This is obviously not a sufficient condition. All of the imaginary quadratic 
fields studied in Chapter 13 are obtained by adjoining elements of degree 2 over Q, but they 
aren’t isomorphic. 

On the other hand, if a is a complex root of x? — x +1, then B = w +1 is a root of 
x? — 3x2 +2x +1. The fields Q(q@) and Q(f) are the same. If we were presented only with 
the two polynomials, it might take some time to notice how they are related. 

What we can describe easily are the circumstances under which there is an isomorphism 
F(a)— F(B) that fixes F and sends a to f. The next proposition, though very simple, is 
fundamental to our understanding of field extensions. 


Proposition 15.2.8 Let F be a field, and let a and 8 be elements of field extensions K/F 
and L/F. Suppose that a and # are algebraic over F’. There is an isomorphism of fields 
o: F(a)— F(B) that is the identity on F and that sends a ~» £ if and only if the irreducible 
polynomials for @ and B over F are equal. — 


Proof. Since a is algebraic over F',, F[a] = F(a), and similarly, F[8] = F(B). Suppose-that 
the irreducible polynomials for a and for 6 over F are both equal to f. Proposition 15.2.6 
tells us that there are isomorphisms 


Fix\/(f) & Fla] and F[x]/(f) 5 FIA]. 


The composed map o = wg! is the required isomorphism F(a) > F(B). Conversely, if 
there is an isomorphism o that is the identity on F and that sends @ to f, and if f(x) isa 
polynomial with coefficients in F such that f(~) = 0, then f(B) = 0 too. (See Proposition 
15.2.10 below.) So the irreducible polynomials for the two elements are equal. 0 


For instance, let w@; denote the real cube root of 2, and let wm = e2"/3 bea complex cube 
root of 1. The three complex roots of x? — 2 are a, a@2 = wa, and a3 = way. Therefore 
there is an isomorphism Q(a@1) ax Q(a@z) that sends a; to Ql. In this case the elements of 
Q(a@}) are real numbers, but @ is not a real number. To understand this isomorphism, we 
must look only at the internal algebraic structure of the fields. 


Definition 15.2.9 Let K and K’ be extensions of the same field F. An isomorphism 
gy: K — K' that restricts to the identity on the subfield F is called an F-isomorphism, or an 
isomorphism of field extensions. If there exists an F-isomorphism gy: K — K"', K and K’ are 
isomorphic extension fields. 


The next proposition was proved for complex conjugation before (12.2.19). 


Proposition 15.2.10 Let g: K > K’ be an isomorphism of field extensions of F,, and let f 
be a polynomial with coefficients in F. Let a be a root of f in K, and let a’ = y(@) be its 
image in K’. Then a’ is also a root of f. 
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Proof. Say that f(x) = anx" + +--+ a,x + ao. Since g is an F-isomorphism and since a; 
are in F’, g(a;) = aj. Since g is a homomorphism, 


0= 9(0) = g(f(@)) = plana” +--+ aja + ao) 
= 9(an)p(a@)” +++» + Pla) P(@) + G(ao) = anol!” + +--+ aja" + ap. 


Therefore @’ is a root of f. O 


15.3. THE DEGREE OF A FIELD EXTENSION 


A field extension K of F can always be regarded as an F-vector space. Addition is the 
addition law in K, and scalar multiplication of an element of K by an element of F is 
obtained by multiplying these two elements in K. The dimension of K, when regarded as an 
F-vector space, is called the degree of the field extension. This degree, which is denoted by 
[ K: F], is a basic property of a field extension. 


(1523.15) [K: F] is the dimension of K, as an F-vector space. 


For example, C has the R-basis (1, i), so the degree [C:R] is 2. 
A field extension K / F is a finite extension if its degree is finite. Extensions of degree 2 
are quadratic extensions, those of degree 3 are cubic extensions, and so on. 


Lemma 15.3.2 


(a) A field extension K/F has degree 1 if and only if F = K. 


(b) An element a of a field extension K has degree 1 over F if and only if a is an element 
of F. 


Proof. (a) If the dimension of K as vector space over F is 1, any nonzero element of K, 
including 1, will be an F-basis, and if 1 is a basis, every element of K isin F. 


(b) By definition, the degree of a over F'is the degree of the (monic) irreducible polynomial 
for aw over F. If aw has degree 1, then this polynomial must be x — @, and if x — a has 
coefficients in F’, then @ is in F. 0 


Proposition 15.3.3 Assume that the field F does not have characteristic 2, that is, 1 +140 
in F. Then any extension K of degree 2 over F can be obtained by adjoining a square 
root: K = F(5), where 5* = d is an clement of F. Conversely, if 6 is an element of a field 
extension of F, and if 5° is in F but 6 is not in F, then F(4) isa quadratic extension of F. 


It is not true that all cubic extensions can be obtained by adjoining a cube root. We 
learn more about this point in the next chapter (see Section 16.11). 


Proof. We first show that every quadratic extension K can be obtained by adjoining a root 
of a quadratic polynomial f(x) with coefficients in F. We choose an element a of K that 
is not in F’. Then (1, @) is a linearly independent set over F. Since K has dimension 2 as a 
vector space over F, this set is a basis for K. It follows that a? is a linear combination of 
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(1, &) with coefficients in F. We write this linear combination as a2 — -ba — c. Thena isa 
root of f(x) = x? + bx +c, and since a is not in F, this polynomial is irreducible over F’. 
This much is also true when the characteristic is 2. 

The discriminant of the quadratic polynomial f is D = b?—4c. Ina field of characteristic 
not 2, the quadratic formula 3 (-b + VD) solves the equation x2 + bx +c = 0. This is proved 
by substituting into the polynomial. There are two choices for the square root, let 5 be one 
of them. Then 64 is in K, 5” is in F, and because «@ is in the field F(6), 6 generates K over 
F. Conversely, if 5? is in F but 6 is not in F, then (1, 5) will be an F-basis for F(d5), so 
ey; tt 2. O 


The term degree comes from the case that K is generated by one algebraic element a: 
K = F(q). This is the first important property of the degree: 


Proposition 15.3.4 


(a) If an element @ of an extension field is algebraic over F, the degree [ F(a): F] of F(a) 
over F is equal to the degree of a@ over F. 

(b) Anelement a of an extension field is algebraic over F if and only if the degree [ F(a): F] 
is finite. 


Proof. If a is algebraic over F,, then by definition, its degree over F is equal to the degrec 
of its irreducible polynomial f over F’. And if f has degree n, then F(a) has the F-basis 
(1,@,...,a@”"~!) (Proposition 15.2.7), so [F(a@): F] =n. If a is not algebraic, then Fla] and 
F(a) have infinite dimension over F. O 


The second important property relates degrees in chains of field extensions. 


Theorem 15.3.5 Multiplicative Property of the Degree. Let F C K CL be fields. 7 © 
[P| | Leek || K »2°}sTheseforevboth | L: K],and [K: F} divide [L: F)]. 


Proof. LetB = (f;...., Bn) bea basis for L asa K-vector space, and let A = (Q1,..., Qn) 
be a basis for K as F-vector space. So[L:K] =n and [K: F] =m. To prove the theorem, 
we show that the set of mn products P = {a;f;} is a basis of L as F-vector space. The 
reasoning in case one of the degrees is infinite is similar. 

Let y be an element of L. Since B is a basis for L over K, y can be expressed uniquely 
as a linear combination b; 8, +--- + bn 6,, with coefficients b; in K. Since A is a basis for 
K over F, each b; can be expressed uniquely as a linear combination qj ja, +--+ + @m jm, 
with coefficients a; ;in F. Then y = });, ; 4; ;a7;8;. This shows that P spans L as an F-vector 
space. If a linear combination Pos a; jot Bj is zero, then because B is a basis for L as 
K-vector space, the coefficient )>; a; ;a; of B; is zero for every j. This being so. aj; is zero 
for every i and every j because A is a basis for K over F’. Therefore P is independent, and 
hence it is a basis for L over F. O 


Corollary 15.3.6 


(a) Let F C K bea finite field extension of degree n, and let a be an element of K. Thena 
is algebraic over F, and its degree over F divides n. 
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(b) Let F C F’ CL be fields. If an element @ of L is algebraic over F, it is algebraic over 
F’. If w has degree d over F, its degree over F’ is at most d. 

(c) A field extension K that is generated over F by finitely many algebraic elements is a 
finite extension. A finite extension is generated by finitely many elements. 

(d) If K is an extension field of F, the set of elements of K that are algebraic over F is a 
subfield of K. 


Proof. (a) The element @ generates an intermediate field F C F(a) C K, and the multi- 
plicative property states that [K: F] = [K: F(@)][F(@): F]. Therefore [F(a) : F] is finite, 
and it divides [K: F. 


(b) Let f denote the irreducible polynomial for @ over F. Since FC F’, f is also an element 
of F’ [x]. Since @ is a root of f, the irreducible polynomial g for a over F” divides f. So the 
degree of g is at most equal to the degree of f. 


(c) Let a, ..., a, be elements that generate K and are algebraic over F, and let F; denote 
the field F(a,,..., a@;) generated by the first 7 of the elements. These fields form a chain 
F=fFoCF,C.:-:-C Fk = K. Since a; is algebraic over F, it is also algebraic over the 
larger field F;_;. Therefore the degree [F;: F;_;] is finite for every i. By the multiplicative 
property, [K: F] is finite. The second assertion is obvious. 


(d) We must show that if a@ and B are elements of K that are algebraic over F, then a + B, 
a B, etc., are algebraic over F’. This follows from (a) and (c) because they are elements of 
the field F(a, 8). O 


Corollary 15.3.7 Let K be an extension field of F of prime degree p. If an element a of K 
is not in F,, then a has degree p over F' and K = F(q). 0 


Corollary 15.3.8 Let K be an extension field of a field F’, let K and F’ be subfields of K that 
are finite extensions of F, and let K’ denote the subfield of K generated by the two fields K 
and F” together. Let [K’: F] = N, [K: F] =m and [F’: F] =n. Then m and n divide N, 
and N < mn. 


Proof. The multiplicative property shows that m and n divide N. Next, suppose that F’ 
is generated over F' by one element: F” = F() for some element B. Then K’ = K(f). 
Corollary 15.3.6(b) shows that the degree of B over K, which is equal to [K’: K], is at most 
equal to the degree of B over F, which is n. The multiplicative property shows that N < mn. 
The case that Fis generated by several elements follows by induction, when one adjoins one 
element at a time. O 


The diagram below sums up the corollary: 


(15.3.9) K! 
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It follows from the corollary that the degree N of K’ over F is divisible by the least common 
multiple of m and n, and that if m and n are relatively prime, then N = mn. 
It might be tempting to guess that N divides mn, but this isn’t always true. 


Examples 15.3.10 


(a) The three complex roots of x> — 2 area, =a, a = wa, and a3 = wa, where a = +/2 
and w = e?'/3, Rach of the roots a; has degree 3 over Q, but Q(a1, a2) = Q(a, w), 
and since w has degree 2 over Q, [Q(@1, a2):Q] = 6. 

(b) Leta = V2 and let B be a root of the irreducible polynomial x* + x + 1 over Q. Because 
3 and 4 are relatively prime, Q(a@, 6) has degree 12 over Q. Therefore @ is not in the 
field Q(B). On the other hand, since i has degree 2 over Q, it is not so easy to decide 
whether or not 7 is in Q(£). (It is not.) 

(c) Let K = Q(V2, i) be the field generated over Q by V2 and i. Both i and V2 have degree 
2 over Q, and since i is complex, it is not in Q(V2). So [Q(V2, i): Q] = 4. Therefore the 
degree of i over Q(V2) is 2. Since /-2 and i also generate K, i is not in the field Q[V-2] 
either. O 


15.4 FINDING THE IRREDUCIBLE POLYNOMIAL 


Let y be an element of an extension field K of F, and assume that y is algebraic over 
F. There are two general methods to find the irreducible polynomial f(x) for_y over 
F’. One is to compute the powers of y and to look for a linear relation among them. 
Sometimes, though not very often, one can guess-the other roots of f, say ¥1,..., Yx, with 
y = 1. Then expanding the product (x — ¥1)--:(« — yg) will produce the polynomial. 
We'll give an example to illustrate the two methods, in which F is the field Q of rational 
numbers. = ne 


Example 15.4.1 Let y = V2 + /3. We compute powers of y, and simplify when possible: 
y?=54+2V6, v4 = 494+ 20./6. We won’t need the other powers because we can eliminate 
6 from these two equations, obtaining the relation y* — 10)” + 1 = 0. Thus y is a root of 


the polynomial g(x) = x* — 10x? + 1. reo Oo 


Two important elementary observations are implicit here: 


Lemma 15.4.2 

(a) A linear dependence relation c,y” +---+¢,y + Co = 0 among powers of an element y 
means that y is a root of the polynomial cy,x” + ---+ 1x + Co. 

(b) Let @ and £ be algebraic elements of an extension field of F’,, and let their degrees over 
F be dj and dp, respectively. The d,dz monomials a’ B/, withO <i < d, and0O < j<d), 
span F(a, B) as F-vector space. a 


Proof. Though important, (a) is trivial. To prove (b), we note that because @ and £ are 
algebraic over F, F(a, 8) = F[a, B] (15.2.6). The monomials listed span Fla, B}. 0 
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Example 15.4.3 The alternate approach to Example 15.4.1 is to guess that the roots of g 
might be 7, = V2 + V3, 2 = -/2= J3, y3 =-V2+ V3, and y4 = J/2 — V3. Expanding 


the polynomial with these roots, we find 


(x — V1) (x — yr) (% — v3) (x — 4) 
Bia? — JT By) SOP 3)) = ete a 


This is the polynomial that we obtained before. O 


The lemma shows that one can always produce a polynomial having an element such 
as y =a + fas a root, provided that the irreducible polynomials for @ and B are known. 
Say that a and B. have degrees dy ang dy over F, respectively. Given any element y of 
F(a, B), we write its powers 1, y, y*,...,y” as linear combinations of the monomials 
a' BJ with 0 < i < d, and 0 < j < dp. When n = ddz, we get n + 1 powers y” that 
are linear combinations of n monomials. So the powers are linearly dependent. A linear 
dependence relation determines a polynomial with coefficients in F with y as a root. 
However, there is a point that complicates matters. The polynomial with root y that we 
find in this way may be reducible. The irreducible polynomial for y over F is the lowest 
degree polynomial with root y. To determine it by this method, we would need a basis for K 
over F. 


Examples 15.4.4 


(a) In Example 15.4.1, where aw = J/2, B = V3 and d; = d> = 2, the elements a! BY with 
inj < 2%ae 1; wo V3, and V6. These elements do form a basis of K over Q. The 
polynomial x* — 10x? + 1 is irreducible. 

(b) We go back to Example 15.3.10(a), in which the three roots of the polynomial x* — 2 
are labeled'a;,. 7 =h293) kerr =] 0, Laake ada Qa, a2). ‘Each of the 
roots a; has degree 3 over F. According to the lemma, the nine monomials at! oe} with 
0 <i, j<3span K over F’. However, these monomials aren’t independent. Since f has 
a root @ in the field L, it factors in L[x], say f(x) = (x — @)q(x). Then q is a root of 
q(x), SO @2 has degree at most 2 over L. The set (1, a2) is a basis for K over the field 
L, so the six monomials ax! 0} with 0 <7 <3 and 0 < j < 2 forma basis for K over F. 
If we want a basis of monomials, we should use this one. ei 


15.5 RULER AND COMPASS CONSTRUCTIONS 


Famous theorems assert that certain geometric constructions cannot be done with ruler and 
compass alone. To illustrate these theorems, we use the concept of degree of a field extension 
to prove the impossibility of trisection of an angle. 


Here are the rules for ruler and compass construction: 


(15.5.1) 


¢ Two points in the plane are given to start with. These points are constructed. 
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¢ If two points po, p; have been constructed, we may draw the line through them, or 
draw a circle with center at po and passing through p,. Such lines and circles are 
then constructed. 


¢ The points of intersection of constructed lines and circles are constructed. 


Points, lines, and circles will be called constructible if they can be obtained in finitely many 
steps, using these rules. 

Notice that our ruler may be used only to draw lines through constructed points. 
We are not allowed to use it for measurement. Sometimes the ruler is referred to as a 
“straight-edge”’ to emphasize this point. 

We begin with some familiar constructions. In each figure, the lines and circles are to 
be drawn in the order indicated. The first two constructions make use of a point g on € whose 
only restriction is that it is not on the perpendicular. Whenever we need an arbitrary point, 
we will construct a particular one for the purpose. We can do this because a constructed line 
£ contains infinitely many points that can be constructed. 


Construction 15.5.2 Construct a line through a constructed point p and perpendicular to a 
constructed line @. 
Casel: p¢ £ 


Case 2: pel 
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Construction 15.5.3 Construct a line parallel to a constructed line £ and passing through a 
constructed point p. 
Apply Cases 1 and 2 above: 


Construction 15.5.4 Mark off a length defined by two points onto a constructed line @, with 
endpoint p. 
Use the construction of parallels: 


length marked 
off on £ 


We introduce Cartesian coordinates into the plane so that the points that are given at 
the start have coordinates (0, 0) and (1, 0). 


Proposition 15.5.5 


(a) Let po = (ao, bo) and p; = (aj, b,) be points whose coordinates a; and b; are in a 
subfield F' of the field of real numbers. The line through po and py, is defined by a linear 
equation with coefficients in F. The circle with center po and passing through py is 
defined by a quadratic equation with coefficients in F’. 

(b) Let A and B be lines or circles defined by linear or quadratic equations, respectively, 
that have coefficients in a subfield F of the real numbers. Then the points of intersection 
of A and B have coordinates in F, or in a real quadratic field extension F’ of F. 


Proof. (a) The line through (ao, bo) and (a, bj) is the locus of the linear equation 


(a1 — ao) (y — bo) = (b1 — bo) (x — ag). 


Section 15.5 Ruler and Compass Constructions 453 


The circle with center (ap, bg) and passing through (a;, b;) is the locus of the quadratic 
equation 


(x — ao)” + (y — bo)? = (a1 — a0)? + (by = bo). 
The coefficients of these equations are in F. 


(b) The point of intersection of two lines is found by solving two linear equations with 
coefficients in F’, so its coordinates are in F. To find the intersection of a line anda circle, 
we use the equation of the line to eliminate one variable from the equation of the circle, 
obtaining a quadratic equation in one unknown. This quadratic equation has solutions in 
the field F’ = F(./D), where D is its discriminant. The discriminant is an element of F. If 
F’ + F, then the degree of F’ over F is 2. If D is negative, there is no real solution to the 
equations. Then the line and circle do not intersect. 


Consider the intersection of two circles, say 
(x—a)?+(y—b))* =e, and (x-a)* + (y-b)* =e, 


where a;, b;, and c; are in F. In general, the solution of a pair of quadratic equations in two 
variables requires solving an equation of degree 4. In this case we are lucky: The difference 
of the two quadratic equations is a linear equation. We can use that linear equation to 
eliminate one variable, as before. The lucky event reflects the fact that, whereas a pair of 
conics may intersect in four points, two circles intersect in at most two points. fis} 


Theorem 15.5.6 Let p be a constructible point. For some integer n, there is a chain of fields 
O=)5,C Fi Gis C-:- CF, = K,. such that 


e K isasubfield of the field of real numbers; 
e the coordinates of p are in K; 
e for eachi =0,...,n —1, the degree [ Fj+1: F;] is equal to 2. 


Therefore the degree [K :Q] is a power of 2. 


Proof. We introduced coordinates so that the points originally given are (0, 0) and (1, 0). 
These points have coordinates in Q. The process of constructing the point p involves a 
sequence of steps, each one of which draws a line or a circle. 

Suppose that all points constructed by the time we are at the kth step have coordinates 
in a field F. The next step constructs a line or circle through two of these points, and 
according to Proposition 15.5.5(a), the line or circle has an equation with coefficients in F’. 
The field does not change. Then according to Proposition 15.5.5(b), any point of intersection 
of the lines and circles constructed so far will have coordinates, either in F, or in a real 
quadratic extension of F. The assertion follows by induction from Proposition 15.5.5 and 
from the multiplicative property of the degree. O 


e We call a real number a constructible if the point (a,0) is constructible. Since we 
can construct perpendiculars, this is the same thing as saying that a is the x-coordinate 
of a constructible point. And since we can mark off lengths, a positive real number a is 
constructible if and only if there is a pair p, q of constructible points whose distance apart is a. 
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Corollary 15.5.7 Let a be a constructible real number. Then a is an algebraic number, and 
its degree over Q is a power of 2. 


Since a is ina field K that is the end of a chain of fields as in the theorem, and since [ K :Q] 
is a power of 2, the degree of a is also a power of 2 (15.3.6). O 


The converse of this corollary is false. There exist real numbers of degree 4 over Q that 
aren’t constructible. Galois theory provides a way to understand this. (This is Exercise 9.17 
of Chapter 16.) 

We can now prove the impossibility of certain geometric constructions. The method 
is to show that if a certain construction were possible, then it would also be possible to 
construct an algebraic number whose degree over Q is not a power of 2. This would contradict 
the corollary. Our example is the impossibility of trisection of the angle, which asks for a 
construction of the angle +0 when @ is given. Now many angles, 45° for instance, can be 
trisected. The trisection problem asks for a general method of construction that will work 
for any “‘given”’ angle. - 

Since it is easy to construct an angle of 60°, we can give this angle to ourselves, using 
ruler and compass constructions. If trisection were possible, we could construct an angle of 
20°. We will show that it is impossible to construct that particular angle, and therefore that 
there is no general method of trisection. 

We'll say that an angle @ is constructible if it is possible to construct a pair of lines 
meeting with angle 0. If we mark off a unit length on one of the lines and drop a perpendicular 
to the other line, we will have constructed the real number cos 6. Conversely, if cos@ is a 
constructible real number, we can reverse this process to construct a pair of lines meeting 
with angle 6. 

The next lemma shows that 20° = 27/9 cannot be constructed. 


Lemma 15.5.8 The real number cos 20° is algebraic over Q and its degree over Q is 3. 
Therefore cos 20° is not a constructible number. 


Proof. Let a = 2cos@ = e!? + e~', where 6 = 27/9. Then e%9 + e318 — 2COS(76/3) nase, 
and 
a8 = (el + 7 19)3 — 6310 4 3¢! 4 e719 4 9319 1 + 3, 


so @ is a root of the polynomial x* — 3x — 1. This polynomial is irreducible over Q because it 
has no integer root. It is therefore the irreducible polynomial for @ over Q. So a has degree 
3 over Q, and so does cos 8@. C) 


One more example: The regular 7-gon cannot be constructed. This is similar to the 
above problem, because constructing 20° is equivalent to constructing the 18-gon. We’ll vary 
the approach slightly. Let 96 = 27r/7 and let ¢ = e!®. Then ¢ is a seventh root of unity, a 
root of the irreducible polynomial equation x® + x° +... +1 (Theorem 12.4.9), so ¢ has 
degree 6 over Q. If the 7-gon were constructible, then cos @ and sin @ would be constructible 
numbers. They would lie in a real field extension K whose degree over Q is a power of 2, say 
2‘. Call this field K, and consider the extension K (1) of K. This extension has degree 2, so 
oo 3 = 21 But ¢ =cos@+isin@ isin K(i). This contradicts the fact that the degree 
of f is 6. 
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The argument we have used is not special to the number 7. It applies to any prime 
integer p, provided that p — 1, the degree of the irreducible polynomial x?~! +...+x +1, 
is not a power of 2. 


Corollary 15.5.9 Let p be a prime integer. If the regular p-gon can be constructed with ruler 
and compass, then p = 2” + 1 for some integer r. O 


Gauss proved the converse: If a prime has the form 2” + 1, then the regular p-gon can be 
constructed. The regular 17-gon, for example, can be constructed by ruler and compass. We 
will learn how to prove this in the next chapter (see Corollary 16.10.5). 


To complete the discussion, we prove a converse to Theorem 15.5.6. 


Theorem 15.5.10 Let Q = Fy C Fi C--.C F, = K bea chain of subfields of the field R 
of real numbers with the property that for eachi = 0,...,-1, eye. : F;] => 2" Ther every 
element of K is constructible. 


Since any extension of degree 2 can be obtained by adjoining a square root, the theorem 
follows from the next lemma. 


Lemma 15.5.11 


(a) The constructible numbers form a subfield of R. 
(b) If ais a positive constructible number, then so is ./a. 


Proof. (a) We must show that if a and b are positive constructible numbers, then a + b, -a, 
ab, and a! (if a#0) are also constructible. The closure in case a or b is negative follows 
easily. Addition and subtraction are done by marking lengths on a line. For multiplication 
and division, we use similar right triangles. 


Given one triangle and one side of a second triangle, the second triangle can be constructed 
by parallels. To construct the product ab, we take r = 1,s =a,andr’ = b. Thens’ = ab. To 
construct a~!, we take r = a, s =1, andr’ =1.Thens’ = a}, 

(b) We use similar triangles again. We must construct them so that r = a, r’ = s, and 
s’ = 1. Then s = \/a. How to make the construction is less obvious this time, but we can 
use inscribed triangles in a circle. A triangle inscribed into a circle, with a diameter as its 
hypotenuse, is a right triangle. This is a theorem of high school geometry, and it can be 
checked using the equation for a circle and Pythagoras’s theorem. So we construct a circle 


whose diameter is 1 + a and proceed as in the figure below. 
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15.6 ADJOINING ROOTS 


Up to this point, we have used subfields of the complex numbers as our examples. Abstract 
constructions are not needed to create these fields, except that the construction of the 
complex number field as an extension of the real number field is abstract. We simply adjoin 
complex numbers to the rational numbers as desired, and work with the subfield they 
generate. But finite fields and function fields are not subfields of a familiar, all-encompassing 
field analogous to C, so these fields must be constructed. The fundamental tool for their 
construction is the adjunction of elements to a ring, which was described in Chapter 11. It is 
applied here to the case that the ring we start with is a field. 

We review the construction. Given a polynomial f(x) with coefficients in a field F’, we 
may adjoin a root of f to F. The procedure is to form the quotient ring 


(15:61) K = F[x]/(f) 


of the polynomial ring F[x]. This construction always yields a ring K and a homomorphism 
F —> K, such that the residue X of x satisfies the relation f(x) = 0 (11.5.2). However, we 
want to construct not only a ring, but a field. Here the theory of polynomials over a field 
comes into play. It tells us that the principal ideal (f) in the polynomial ring F[x] is a 
maximal ideal if and only if f is an irreducible polynomial (12.2.8). Therefore K will be a 
field if and only if f is irreducible (11.8.2). 


Lemma 15.6.2 Let F be a field, and let f be an irreducible polynomial in F[x]. Then the 
ring K = F[x]/(f ) is an extension field of F,, and the residue X of x is a root of f(x) in K. 


Proof. The ring K is a field because (f) is a maximal ideal, and the homomorphism 
F — K, which sends the elements of F to the residues of the constant polynomials, is 
injective because F is a field (11.3.20). So we may identify F with its image, a subfield of K. 
The field K becomes an extension of F by means of this identification. Finally, x satisfies 
the equation f(x) = 0. It is a root of f (see (11.5.2)). O 


« A polynomial f splits completely in a field K if it factors into linear factors in K. 


Proposition 15.6.3 Let F be a field, and let f(x) be a monic polynomial in F[x] of positive 
degree. There exists a field extension K of F such that f(x) splits completely in K. 


Proof. We use induction on the degree of f. The first case is that f has a root a in F, so 
that f(x) = (x — w)q(x) for some polynomial q. If so, we replace f by g, and we are done 
by induction. Otherwise, we choose an irreducible factor g of f. By Lemma 15.6.2, there is 
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a field extension F; of F in which g has a root aw. Then q@ is a root of f too. We replace F 
by Ff}, and this reduces us to the first case. O 


As we see, the polynomial ring F[x] is an important tool for studying extensions of 
a field F. When we are working with field extensions, there is an interplay between the 
polynomial rings over the fields. This interplay doesn’t present serious difficulties, but instead 
of scattering the points that should be mentioned about in the text, we have collected them 
into the next proposition. 


Proposition 15.6.4 Let f and g be polynomials with coefficients in a field F, with f #0, and 
let K be an extension field of F. 


(a) The polynomial ring K[x] contains F[x] as subring, so computations made in the ring 
F [x] are also valid in K[x]. 

(b) Division with remainder of g by f gives the same answer, whether carried out in F[x] 
or in K[x]. 

(c) f divides gin K[x] if and only if f divides g in F[x]. 

(d) The (monic) greatest common divisor d of f and g is the same, whether computed in 
F[x] or in K[x]. 

(e) If f and g have a common root in K, they are not relatively prime in F[x]. If f and 
g are not relatively prime in F[x], there exists an extension field in which they have a 
common root. 

(f) If f is an irreducible element of F[x] and if f and g have a common root in K, then f 
divides g in F[x]. 


Proof. (a) This is obvious. 


(b) Carry out the division in F[x]: g = fq +r. This equation remains true in the bigger ring 
K[x], and since division with remainder in K[x] is unique, carrying the division out in K[x] 
leads to the same result. 


(c) This is (b) in the case that the remainder is zero. 


(d) Let d and d’ denote the greatest common divisors of f and g in F[x] and K[x], 
respectively. Then d is acommon divisor in K [x], and since d’ is the greatest common divisor 
in K[x], d divides d’. In addition, we know that d has the form d = pf + qg, for some 
elements p and g in F[x]. Since d’ divides f and g, d' divides d. Thus d and d’ are associates 
in K[x], and since they are monic polynomials, they are equal. 


(e) Let w be a common root of f and g in K. Then x — @ is a common divisor of f and 
g in K[x]. So the greatest common divisor of f and g in K[x] isn’t 1. By (@), it isn’t 1 in 
F[x] either. Conversely, if f and g have a common divisor d of positive degree, there is an 
extension field of F in which d has a root. This root will be a common root of f and g. 


(f) If f is irreducible, its only monic divisors in F[x] are 1 and /f. Part (e) tells us that the 
greatest common divisor of f and g in F[x] isn’t 1. Therefore itis f. D 


The final topic of this section is the derivative f’(x) of a polynomial f(x). The 
derivative is computed using the rules from calculus for differentiating polynomial functions. 
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In other words, if f(x) = dnx” + ayn_1x""! +--+ + a,x + ao, then 
(15.6.5) f(x) = nanx" + (0 — Van_yx"? +++ +4. 


The integer coefficients in this formula are interpreted as the elements 1 + --- + 1 of F. Soif 
f has coefficients in a field F, its derivative does too. It can be shown that familiar rules of 
differentiation, such as the product rule, hold. (This is Exercise 115.) 


The derivative can be used to recognize multiple roots of a polynomial. 


Lemma 15.6.6 Let f be a polynomial with coefficients in a field F. An element @ in an 
extension field K of F is a multiple root, meaning that (x — a)* divides f, if and only if it is 
a root of f and also a root of f’. 


Proof. \faisaroot of f, then x—a@ divides f, say f(x) = (x —a@) g(x). Then a@ is a multiple 
root of f if and only if it is a root of g. By the product rule for differentiation, 


f(x) = (% — a) g!(x) + B(x). 
Substituting x = @, one sees that f’(a@) = 0 if and only if g(@) = 0. (iat 


Proposition 15.6.7 Let f(x) be a polynomial with coefficients in F. There exists a field 
extension K of F in which f has a multiple root if and only if f and f’ are not relatively 
prime. 


Proof. If f has a multiple root in K, then f and f’ have a common root in K, so they are 
not relatively prime in K or in F’. Conversely, if f and f’ are not relatively prime, then they 
have a common root in some field extension K, hence f has a multiple root there. O 


Here is one of the most important applications of the derivative to field theory: 


Proposition 15.6.8 Let f be an irreducible polynomial in F[x}. 


(a) f has no multiple root in any field extension of F unless the derivative f’ is the zero 
polynomial. 


(b) If Fis a field of characteristic zero, then f has no multiple root in any field extension of 
F. 


Proof. (a) We must show that f and /’ are relatively prime unless f’ is the zero polynomial. 
Since it is irreducible, f will have a nonconstant factor in common with another polynomial 
g only if f divides g. And if f divides g, then unless g = 0, the degree of g will be at least 
as large as the degree of f. If the derivative f’ isn’t zero, its degree is less than the degree 
of f, and then f and f’ have no common nonconstant factor. 


(b) In a field of characteristic zero, the derivative of a nonconstant polynomial isn’t zero. 0 


The derivative of a nonconstant polynomial f may be zero when F has prime 
characteristic p. This happens when the exponent of every monomial that occurs in f is 
divisible by p. A typical polynomial whose derivative is zero in characteristic 5 is 


fUxy=ee eax eeppnes 
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where a, b,c can be any elements of F.. Since the derivative of this polynomial is identically 
zero, all of its roots in an extension field will be multiple roots. 


15.7 FINITE FIELDS 


In this section, we describe the fields of finite order. The characteristic of a finite field K 
cannot be zero, so it is a prime integer (3.2.10), and therefore K will contain one of the prime 
fields F = F,. Since K is finite, it will be finite-dimensional when considered as a vector 
space over this field. 

Let r denote the degree [K: F']. As an F-vector space, K is isomorphic to the space 
F’ of column vectors, which contains p’ elements. So the order of a finite field, the number 
of its elements, is a power of a prime. It is customary to use the letter g for this order: 


(15.7.1) (iM =p = a 


In this section, g will denote a positive power of a prime integer p. Fields of order g are 
often denoted by F,. We are going to show that all finite fields of order g are isomorphic, so 
this notation isn’t too ambiguous, though when r > 1 the isomorphism between two of them 
will not be unique. 

The simplest example of a finite field other than a prime field is the field F4 of order 4. 
Let K denote this field, and let F = F). There is just one irreducible polynomial of degree 2 
in F[x], namely x* +x +1 (12.4.4), and K is obtained by adjoining a root @ of this polynomial 
to F: 

Kieiiige|/f 02-01). 

Because the element a, the residue of x, has degree 2, the set (1, a) forms a basis of K over 
F (15.2.7). The elements of K are the four linear combinations of the basis, with coefficients 
modulo 2: 


(15.7.2) Kes elie Sa). 


The element 1 + @ is the other root of f(x) in K. Computation in F4 is made using the 
relations 1+ 1=Oanda*+a+1=0. 


Try not to confuse the field F4 with the ring Z/(4), which isn’t a field. 


Here are the main facts about finite fields: 


Theorem 15.7.3 Let p be a prime integer, and let g = p’ be a positive power of p. 


(a) Let K be a field of order g. The elements of K are roots of the polynomial x4 — x. 

(b) The irreducible factors of the polynomial x? — x over the prime field F = Fp are the 
irreducible polynomials in F [x] whose degrees divide r. 

(c) Let K be a field of order g. The multiplicative group K~* of nonzero elements of K is a 
cyclic group of order g — 1. 

(d) There exists a field of order q, and all fields of order g are isomorphic. 

(e) A field of order p” contains a subfield of order pk if and only if k divides r. 
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Corollary 15.7.4 For every positive integer r, there exists an irreducible polynomial of degree 
r over the prime field Fp. 


Proof. According to (d), there is a field K of order gq = p’. Its degree [K: F] over F = Fp is 
r. According to (¢), the multiplicative group K” is cyclic. It is obvious that a generator @ for 
this cyclic group will generate K as extension field, i.e., that K = F(a). Since [K: F] =r, a 
has degree r over F. So a@ is the root of an irreducible polynomial of degree r. CO 


We examine a few examples in which q is a power of 2. The irreducible polynomials of 


degree at most 4 over F are listed in (12.4.4). 


Examples 15.7.5 


(i) 


(ii) 


(iii) 


The field F4 has degree 2 over F). Its elements are the roots of the polynomial 
(1527.6) x4—x=x(x-1)07 4x41). 


Note that the factors of x? — x appear, because Fy contains F. 
Since we are working in characteristic 2, signs are irrelevant: x -1=x+1. 


The field Fg of order 8 has degree 3 over the prime field F2. Its elements are the eight 
roots of the polynomial x* — x. The factorization of this polynomial in F) is 


(5757) x =k — 1) ee er a) 


The cubic factors are the two irreducible polynomials of degree 3 in F[x]. 


To compute in the field Fg, we choose an element # in that field, a root of one of 
the irreducible cubic factors, say of x? + x + 1. It will have degree 3 over F2. Then 
(1, B, B*) is a basis of Fg as a vector space over F>. The elements of Fg are the eight 
linear combinations of this basis with coefficients 0 and 1: 


(15.7.8) Fg = (0, 1, 6, 1+, B?,1+ 6%, B+ 62, 1+B+ PB}. 


Computation in Fg is done using the relations 1 + 1 = 0 and 6? + B+1=0. 


Note that x? + x + 1 is not a factor of x® — x, and therefore Fg does not contain F4. It 
couldn’t, because [Fg:F 2] = 3, [F4:F2] = 2, and 2 does not divide 3. 


The field F\¢ of order 16 has degree 4 over F. Its elements are roots of the polynomial 
x!6 _ x. This polynomial factors in F2[x] as 


(15.7.9) x16 yx = x(x—1) (x? +041) ct 4x3 4-027 404144 1D 


The three irreducible polynomials of degree 4 in F2[x] appear here. The factors of 
x* — x are also among the factors, because Fj¢ contains F4. O 


We now begin the proof of Theorem (15.7.3). We let F denote the prime field F p. 


Proof of Theorem 15.7.3(a). (the elements of K are roots of x4 — x) Let K be a field of order 
q. The multiplicative group K™ has order g — 1. Therefore the order of any element a of K* 
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divides q — 1, so a9) _ 1 = 0, which means that a is a root of the polynomial x@-) — 1. 
The remaining element of K, zero, is the root of the polynomial x. So every element of K is 
aroot of x(x9-) — 1) = x9 — x. O 


Proof of Theorem 15.7.3(c). (the multiplicative group is cyclic) The proof is based on the 
Structure Theorem 14.7.3 for abelian groups, which tells us that K% is a direct sum of cyclic 
groups. 

The Structure Theorem was stated with additive notation: A finite abelian group V is 
a direct sum C; ® --- ® Cy of cyclic subgroups of orders dj . . .d;, such that each d; divides 
the next: d)|dz|---|d,. Let d = dx. If w; is a generator for C;, then d;w; = 0, and since d; 
divides d, dw; = 0. Therefore dv = 0 for every element v of V. The order of every element 
of V divides d. 

Going over to multiplicative notation, K™“ is a direct sum of cyclic subgroups, say 
HY, ®---® Hy, where H; has order d;, and d;|d>| ---|d,. With d = d; as before, the order 
of every element a of K™ divides d, which means that a? = 1. Therefore every element of 
K~ is a root of the polynomial x? — 1. This polynomial has at most d roots in K (12.2.20), 
and therefore |K*| = q — 1 < d. On the other hand, |K*| = |H, ®---® H;| =d,... dx. So 
d,...dy =|K*| =q-—1 < d. Since d = d,, the only possiblility is that k = 1 and g —1 =d. 
Therefore KX = H,, and K™ is cyclic. oO 


Proof of Theorem 15.7.3(d). (existence of a field with q elements) Since we have proved (a), 
we know that the elements of a field of order g will be roots of the polynomial x? — x. 
There exists a field extension L of F in which this polynomial splits completely (15.6.3). 
The natural thing to try is to take such a field L and hope for the best, that the roots of 
x? — x in L form the subfield K that we are looking for. This is shown by Lemma 15.7.11 
below. 


Lemma 15.7.10 Let F be a field of prime characteristic p, and let g = p’ be a positive power 
of p. 


(a) The polynomial x? — x has no multiple root in any field extension of F’. 
(b) In the polynomial ring F[x, y], (x + y)? = x7 + y4. 


Proof. (a) The derivative of x7 — x is e- — 1. In characteristic p, the coefficient g 
is equal to 0, so the derivative is -1. Since the constant polynomial -1 has no root, x4 — x 
and its derivative have no common root, and therefore x? — x has no multiple root (Lemma 


15.6.6). 
(b) We expand (x + y)? in Z[x, y]: 


(x+y)? =xP+ (aa? ty + (ey shes fee )xye + y?. 


Lemma 12.4.8 tells us that the binomial coefficients (?) are divisible by p for r in the range 
1 <r < p. Since F has characteristic p, the map Z[x, y] > F[x, y] sends these coefficients 
to zero, and (x + y)? =x? + y? in F[x, y]. The fact that (x + y)? = x4 + y4 when g = p’” 
follows by induction. ; Cj 
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Lemma 15.7.11 Let p be a prime and let g = p’ be a positive power of p. Let L be a field 
of characteristic p, and let K be the set of roots of x? — x in L. Then K is a subfield of L. 


Proof. Let a and f be roots of the polynomial x7 — x in L. We have to show that a + f, ~@, 
af, ao! (if «#0), and 1 are roots of the same polynomial. So we assume that w@? = @ and 
f7 = B. The proofs that af, a!, and 1 are roots are obvious enough that we omit them. 
Substitution into Lemma 15.7.10(b) shows that (a + 8)? =a? + BY =a+ B. 

Finally, we verify that -1 is a root of x? — x. Since products of roots are roots, it will 
follow that -a@ is a root. If p42, then g is an odd integer, and it is true that (-1)% = -1. 
If p = 2, q is even, and (-1)% = 1. But in this case, the characteristic of L is 2, so 
1=-1lin L. C) 


We must still show that two fields K and K’ of the same order gq = p’ are isomorphic. 
Let @ be a generator for the cyclic group K*. Then K = F(a), so the irreducible polynomial 
f for a over F has degree equal to |[K: F] =r. Then f generates the ideal of polynomials 
in F'[x] with root q@, and since a@ is also a root of x? — x, f divides x7 — x. Since x? — x splits 
completely in K’, f has a root a’ in K’ too. Then F(a) and F(a’) are both isomorphic to 
F{(x]/(f ), hence to each other. Counting degrees shows that F(a’) = K',so K and K’ are 
isomorphic. Oo 


Proof of Theorem 15.7.3(e). (subfields of Fg) Let q = p” and q = p*. Then 
[Fj : Fp] = r and [Fy : Fp] = k, we can’t have F, C Fy C Fg unless k divides r. Sup- 
pose that k divides r, say r = ks. Substitution of y = p* into the equation y’ — 1 = 
(y — 1)(yS! +--- + y+ 1) shows that q’ — 1 divides g — 1. Since the multiplicative group 
K™ is cyclic of order g — 1, and since q’ — 1 divides g — 1, K* contains an element f of order 
q' — 1. The q' — 1 powers of this element are roots of x7) — 1 in K. Therefore x? — x 
splits completely in K. Lemma 15.7.11 shows that the roots form a field of order g’. O 


Proof of Theorem 15.7.3(b). (the irreducible factors of x4 — x) Let g be an irreducible 
polynomial over F of degree k. The polynomial x? — x factors into linear factors in K 
because it has g roots in K. If g divides x? — x, it will also factor into linear factors, so it 
will have a root 6 in K. The degree of B over F divides [K: F] = r, and is equal to k. Sok 
divides r. Conversely, suppose that k divides r. Let 6 be a root of g in an extension field of 
F. Then [F(f): F] = k, and by (e), K contains a subfield isomorphic to F(B). Therefore g 
has a rootin K, and so g divides x? —.x. 


This completes the proof of Theorem 15.7.3. . O 


15.8 PRIMITIVE ELEMENTS 


Let K be a field extension of a field F. An element @ that generates K/F, i.e., such that 
K = F(a), is called a primitive element for the extension. Primitive elements are useful 
because computation in F(@) can be done easily, provided that the irreducible polynomial 
for a over F is known. 


Theorem 15.8.1 Primitive Element Theorem. Every finite extension K of a field F' of 
characteristic zero contains a primitive element. 
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The statement is true also when F is a finite field, though the proof is different. For an 
infinite field of characteristic p+0, the theorem requires an additional hypothesis. Since we 
won’t be studying such fields, we won’t consider that case. 


Proof of the Primitive Element Theorem. Since the extension K / F is finite, K is generated 
by a finite set. For example, a basis for K as F-vector space will generate K over F. Say 
that K = F(ay,..., a@,). We use induction on k. There is nothing to prove when k = 1. For 
k > 1, induction allows us to assume the theorem true for the field K, = Flay, ..., @g_1) 
generated by the first k — 1 elements a@;. So we may assume that K, is generated by a 
single element 6. Then K will be generated by the two elements a, and f. The proof of 
the theorem is thereby reduced to the case that K is generated by two elements. The next 
lemma takes care of this case. O 


Lemma 15.8.2 Let F be a field of characteristic zero, and let K be an extension field that is 
generated over F by two elements a and 8. For all but finitely many c in F, y = B+ caisa 
primitive element for K over F. 


Proof. Let f(x) and g(x) be the irreducible polynomials for a and £f, respectively, over 
F, and let K be a field extension of K in which f and g split completely. Call their roots 
Q1,.-.,Q@m and f;,..., Bn, respectively, with a = a, and B = fy. 

Since the characteristic is zero, the roots w; are distinct, as are the roots 6; (15.6.8)(b). 
Let yj = Bj; +ca;, withi =1,...,mand j=1,...,n. When (i, 7) #(, €), the equation 
Vij = Yxe holds for at most one c. So for all but finitely many elements c of F, the yj; will 
be distinct. We will show that if c avoids these ‘“‘bad”’ values, then v1; = 8; + ca; will be a 
primitive element. We drop the subscript, and write y = B, + cay. 

Let L = F(y). To show that y is a primitive element, it will be enough to show that a 
isin L. Then 6; = y — ca, will be in L too, and therefore L will be equal to K. To begin 
with, a is a root of f(x). The trick is to use g to cook up a second polynomial with a; as a 
root, namely h(x) = g(y— cx). This polynomial doesn’t have coefficients in F’, but because 
gisin F[x],c isin F, and y is in L, the coefficients of h are in L. 

We inspect the greatest common divisor d of f and h. It is the same, whether computed 
in L[x] or in the extension field K[x] (15.6.4). Since f(x) = (x — a)---(% —Qm) in K, d 
is the product of the factors x — a; that also divide h, i.e., those such that a; is a common 
root of h and f. One common root is a. If we show that this is the only common root, it 
will follow that d = x — a, and because the greatest common divisor is an element of L[x] 
(15.6.4)(d), that a, is an element of L. 

So all we have to do is check that a; is not a root of kh when i > 1. We substitute: 


h(ai) = g(yv — ca;). The roots of g are B;...., Bn, So we must check that y — ca;# Bj; 
for any j, or that B; + ca; #f; + ca;. This is true because c has been chosen so that the 
elements 7; ; are distinct. O 


15.9 FUNCTION FIELDS 


In this section we look at function fields, the third class of field extensions mentioned at the 
beginning of the chapter. The field C(¢) of rational functions in ¢ will be denoted by F. Its 
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elements are fractions p/g of complex polynomials, with g#0. Function fields are finite 
field extensions of F. 

Let @ be a primitive element for an extension field K of F degree n, and let f be the 
irreducible polynomial for a over F, so that K = F(a) is isomorphic to the field F[x]/(f), 
with a corresponding to the residue of x. By clearing denominators, we make / into a 
primitive polynomial that we write as a polynomial in x: 


(15.9.1) f(t, X) = an(t)x" +--+» +a,x + aot). 


The hypothesis that f is a primitive polynomial means that the coefficients a;(f) are 
polynomials in f with greatest common divisor 1, and that a,(t) is monic (12.3.9). The 
Riemann surface X of such a polynomial was defined in Section 11.9, as the locus of zeros 
{ f = 0} in complex (¢, x)-space C?. It was shown there that X is an n-sheeted branched 
covering of the complex t-plane T (11.9.16). The branch points are the points t = fo of T 
at which the one-variable polynomial f(¢, x) has fewer than n roots, which happens when 
f(to, x) has a multiple root, or when fg is a root of the leading coefficient a, (t) of f (11.9.17). 

As before, we use the notation X’ for a set obtained by deleting an unspecified finite 
subset from X, and instead of saying that some statement is true except at a finite set of 
points of X, we will say that it is true on X’. 


An isomorphism of extension fields K and L of F was defined in (15.2.9). It is an 
isomorphism of fields g: K — L that restricts to the identity on F: 


(15.9.2) Ko 
[| 
6 


The vertical arrows in this diagram are the inclusions of F as a subfield into K and L, and 
the long equality symbol stands for the identity map. 


e An isomorphism of branched coverings X and Y of T is a continuous, bijective map 
n: X' — Y’ that is compatible with the projections of these surfaces to T: 


(15.9.3) Yi ey 
fH ——————— IRE 


The primes indicate that we expect to delete finite sets of points from X and Y in order that 
the map 77 be defined and bijective. 

Speaking a bit loosely, we call a branched covering 2: X — T path connected if X’ is 
path connected, by which we mean that for every finite subset A of X, the set X — A is path 
connected. 


The object of this section is to explain the next theorem, which describes function fields 
in terms of their Riemann surfaces. 
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Theorem 15.9.4 Riemann Existence Theorem. There is a bijective correspondence between 
isomorphism classes of function fields of degree n over F and isomorphism classes of con- 
nected, n-sheeted branched coverings of T, such that the class of the field extension K defined 
by an irreducible polynomial f(t, x) corresponds to the class of its Riemann surface X. 


This theorem gives us a way to decide when two polynomials of the same degree in 
x define isomorphic field extensions. A simple criterion that can often be used is that the 
branch points of their Riemann surfaces must match up. However, the theorem fails to tell 
us how to find a polynomial with a given branched cover as its Riemann surface. It cannot do 
this. Many polynomials define isomorphic field extensions, and finding something is difficult 
when there are many choices. 


The proof of the theorem is too long to include, but one part is rather easy to verify: 


Proposition 15.9.5 Let f(t, x) and g(t, y) be irreducible polynomials in C[r, x] and C[t, y], 
respectively. Let K = F[x]/(f) and L = F[y]/(g) be the field extensions they define, and 
let X and Y be the Riemann surfaces { f = 0} and {g = 0}. If K/F and L/F are isomorphic 
field extensions, then X and Y are isomorphic branched coverings of T. 


Proof. The residue of y in L = F[y]/(g), let’s call it B, is a root of g, i.e., g(t, B) = 0, and 
an F-isomorphism g: K > L gives us a root of g in K, namely y = yg !(B). So g(t, y) = 0. 
As is true for any element of K = F[x]/(f ), y can be represented as the residue modulo 
(f ) of an element of F[x]. We let u be such an element, and we define the isomorphism 
n:X — Y by n(t, x) = (t, u(t, x)). 


We must show that if (¢, x) is a point of X, then (¢, u) is a point of Y. Since g(t, y) = 0 
in K and since u is an element of F[x] that represents y, g(t, u) is in the ideal (f ). There 
is an element h of F[x] such that 

g(t,u) = fh. 


If (t, x) is a point of X, then f(t, x) = 0, and so g(t, uw) = 0 too. Therefore (ft, u) is indeed 
a point of Y. However, since u and h are elements of F[x], their coefficients are rational 
functions in ¢ that may have denominators. So 7 may be undefined at a finite set of points. 


The inverse function to 7 is obtained by interchanging the roles of K and L. O 


Cut and Paste 


“Cut and paste”’ is a procedure to construct or deconstruct a branched covering. 


We go back to our example of the Riemann surface X of the polynomial x? awd 
write x = xo + X1i as before. If we cut X open along the double locus of Figure 11.9.15, the 
negative real t-axis, it decomposes into the two parts x9 > 0 and xo < 0. Each of these parts 
projects bijectively to T, provided that we disregard what happens along the cut. 

Turning this procedure around, we can construct a branched covering isomorphic to 
X in the following way: We stack two copies S;, Sz of the complex plane over 7 and cut 
them open along the negative real axis. These copies of T will be called sheets. Then we glue 
sideA of the cut on S; to sideB of the cut on S2 and vice versa. (This cannot be done in 


three-dimensional space.) 
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side A 
side B 


(15:2:6) . Sides A and B. 


Suppose we are given an n-sheeted branched covering X — T, and let A = 
{P1,---, Pk} be the set of its branch points in JT. For v = 1,...,k, we choose nonin- 
tersecting half lines C,, that lead from p, to infinity. We cut 7 open along these half lines, 
and we also cut X open at all points that lie over them. 


We should be specific about what we mean by cutting. Let’s agree that cutting 7 open 
means removing all points of the half lines C,, including p,, and that cutting X open means 
removing all points that lie over those half lines. 


Lemma 15.9.7 When X is cut open above the half lines C,, it decomposes as a union of n 
“sheets” S,,..., S,, which can be numbered arbitrarily. Each sheet projects bijectively to 
the cut plane 7. 


This is true because the cut surface X is an unbranched covering space of the cut plane 7, 
which is a simply-connected set: Any loop in the cut plane can be contracted continuously 
to a point. It is intuitively plausible that every unbranched covering of a simply connected 
space decomposes completely. The sheet that contains a point p of X consists of all points 
that can be joined to p by a path without crossing the cuts. (This is an exercise in [Munkres], 
p. 342). O 


C, 


(15.9.8) ' The Cut Plane T. 


Now to reconstruct the surface X we take n copies of the cut plane 7, we call them 
“sheets” and label them as S;,..., S,. We stack them up over T. Except for the cuts, the 
union of these sheets is our branched covering. We must describe the rule for gluing the 
sheets back together along the cuts. On 7, we make a loop £, that circles a branch point 
Pv in the counterclockwise direction, and we call the side of C, we pass through before 
crossing C\, as ‘“‘side A” and the side we pass through after crossing as “‘side B.” We label 
the corresponding sides of the sheet S; as side A; and side B;, respectively. Then the rule 
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for gluing X amounts to instructions that side A; is glued to side B j for some j. This rule is 
described by the permutation o,, of the indices 1, ..., nm that sends i~» if. 

It seems clear that we can construct a covering using an arbitrary set of permutations 
Oy, except that what should happen above the branch points themselves is not clear. To 
avoid ambiguity, we simply delete all branch points and all points that lie over them. 


¢ Branching Data: For v=1,...,r, a permutation o, of the indices 1, ...,n. 
e Gluing Instructions: If o,,(i) = j, glue side A; to side B j along the cut Cy. 


When the gluing is done no cuts remain, and the union of the sheets is our covering. As is 
true of the Riemann surface depicted in Figure 11.9.15, four dimensions will be needed to 
do the gluing without self crossings. 

If o, is the trivial permutation, then each sheet is glued to itself above C,. Then that 
cut isn’t needed, and we say that p, is not a true branch point. 

The next corollary restates the above discussion. 


Lemma 15.9.9 Every n-sheeted branched covering X — T is isomorphic to one constructed 
by the cut-and-paste process. — — a 


Note: The numbering of the sheets is arbitrary, and the concept of a ‘“‘top sheet” has no 
intrinsic meaning for a Riemann surface. If there were a top sheet, one could define x as a 
single valued function of t by choosing the value on that sheet. One can do this only after 
the Riemann surface has been cut open. Wandering around on X leads from one sheet to 
another. 0 


Except for the arbitrary numbering of the sheets, the permutations o, are uniquely 
determined by the branched covering X. A change of numbering by a permutation ¢ will 
change each o, to the conjugate p ‘oye. 


Lemma 15.9.10 Let X and Y be branched coverings constructed by cut and paste, using the 
same points p, and half lines C,. Let the permutations defining their gluing data be o, and 
Ty, respectively. Then X and Y are isomorphic branched coverings if and only if there is a 


permutation p such that t) = pioyoforeachy. — C) 


Lemma 15.9.11 The branched covering X constructed by cut and paste is path connected if 
and only if the permutations oj, ..., 0; generate a subgroup H of the symmetric group that 
operates transitively on the indices 1, ..., n. 


Proof. Each sheet is path connected. If the permutation o, sends the index i to j, the sheets 
S; and S; are glued together along the cut Cy. Then there will be a short path across the cut 
that leads from a point of S; to a point of S;, and because the sheets themselves are path 
connected, all points of S; U S; can be connected by paths. So X is path connected if and 
only if, for every pair of indices i, j, there is a sequence of the permutations o,, that carries 
i= igi, ~ --- ~ ig =j. This will be true if and only if H operates transitively. fai 
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Example 15.9.12 The simplest k-sheeted path connected branched coverings of 7 are 
branched at a single point. Let Y be such a covering, branched only at the origin t = 0. 
The branching data for Y consists of a single permutation o, the one that corresponds to a 
loop around the origin. The previous lemma tells us that, since Y is path connected, o must 
operate transitively on the k indices, and the only permutations that operate transitively are 
the cyclic permutations of order k. So with suitable numbering of the sheets, o = (12 --- k). 
There is, up to isomorphism, exactly one k-sheeted branched covering branched only at the 
origin. The Riemann Existence Theorem tells us that there is, up to isomorphism, a unique 
field extension with this Riemann surface. It is not hard to guess this field extension: it is the 
one defined by the polynomial y* — t,i.e., K = F(y), where y = +/t. The Riemann surface 
Y has k sheets. It is branched only at the origin because each ¢ different from zero has k 
complex kth roots. 

There are two more things to be said here. First, the theorem asserts that this is the only 
field extension of degree k branched at the single point t = 0. This isn’t obvious. Second, 
the same field extension K = F(y) can be generated by many elements. For most choices of 
generators, it would not be obvious that there is only one true branch point. i 


Computing the Permutations 


Given a polynomial f(t, x), one wishes to determine the permutations o, that define the 
gluing data of its Riemann surface. Two problems present themselves. First, the “local 
problem:” At each branch point p one must determine the permutation o of the sheets that 
occurs when one circles that point. As we have seen, o depends on the numbering of the 
sheets. Second, one must take care to use the same numbering for each branch point. This 
is the more difficult problem. A computer has no problem with it, but except in very simple 
cases, it is difficult to do by hand. 


To compute the permutations, the computer chooses a “‘base point”’ D in the cut plane 
T and computes the n roots of the polynomial f(b, x) numerically, with a suitable accuracy. 
It numbers these roots arbitrarily, say 71, ..., Yn, and labels the sheets by calling S; the 
sheet that contains the root y;. Then it walks to a point b, in the vicinity of a branch point 
Pv, taking care not to cross any of the cuts. The roots ; vary continuously, and the computer 
can follow this variation by recomputing roots every time it takes a small step. This tells it 
how to label the sheets at the point b,. Then to determine the permutation o,,, the computer 
follows a counterclockwise loop £, around p,, again recomputing roots as it goes along. 
Because the loop crosses the cut Cy, the roots will have been permuted by o,, when the path 
returns to by. In this way, the computer determines o,,. And because the numbering has 
been established at the base point b, it will be the same for all of the branch points. 

Needless to say, doing this by hand is incredibly tedious. We find ways to get around 
the probiem in the examples we present below. 


The local problem can be solved by analytic methods, and we give an incomplete 
analysis here. The method is to relate the Riemann surface to one that we know, namely to 
the Riemann surface Y of the polynomial y* — t. Let fo be a branch point of the Riemann 


Section 15.9 Function Fields 469 


surface x : { f(t, x) = 0}, where f is a polynomial of the form (15.9.1). Substituting t = f, 
we obtain the one-variable polynomial f°(x) = f(t, x). 


Lemma 15.9.13 Let xo be a root of f(x). Suppose that 
¢ xq isa k-fold root of f(x), and 
e the partial derivative a is not zero at the point (fo, xq). 


Then the permutation of the sheets at the point fg contains a k-cycle. 


Proof. We change variables to move the point (fo, xo) to the origin (0, 0), so that f9(x) = 
f(O, x), and we write f(t, x) = f°(x) —tv(t, x). Then ui (0, 0) = -v(0, 0). Our hypotheses 
tell us that v(0, 0) #0. Also, since x = 0 is a k-fold root of f°(x), that polynomial has the 
form x*u(x) where u (x) is a polynomial in x and u(0) #0. Then f(t. x) = xku(x)—tv(t, x). 
Let c = u(0)/v(0, 0). We replace t by c"!t. The result is that now u(0) Jv(0, 0) = 1. 

We restrict attention to a small neighborhood U of the origin (0, 0) in (t, x)-space, and write 
the equation f = Oas 

x*u/v=t. 


For (t, x) in U,u/v is near to 1. Among the kth roots of u/v, one will be near to 1, and that 
root, call it w, depends continuously on the point (¢, x) in U. The other kth roots will be 
¢’w, where € = e27#/k, 

Let y = xw. Then in our neighborhood U, the equation f(t, x) = 0 is equivalent with 
y* = t. Therefore there are k sheets of our Riemann surface X that intersect U, and when 
we make a loop around the point t = 0, those k sheets will be permuted in the same way as 
the sheets of the Riemann surface Y, i.e., cyclically. O 


We now describe the branching data for a few simple polynomials. We take polynomials 
that are monic in x. The branch points will be the points fo at which f(¢o, x) has multiple 
roots — the points at which f(fo, x) and F (1p, x) have a common root. Proposition 15.9.13 
will be our main tool. 


Examples 15.9.14 (a) f(t, x) =x2-8+1, £=2x, 2 =-3? +1. 


Here X is a two-sheeted covering of T. There are three branch points t = 0, ¢ = 1, 
and t = -1, and of + (0 at all of them. So the permutation of the sheets at each of these points 
contains a two-cycle. Since there are two sheets, each of the permutations is the transposition 
(12). We don’t need to be careful about the numbering when there are two sheets. 


(b) We ask for a path connected, three-sheeted branched covering X of T branched at two 
points p; and pp, and such that the permutation oj at the point p; is a transposition. 

We may label the sheets so that 0; = (12). Then because X is path connected, 
the permutation o2 must be either (23) or (13) (15.9.11). Switching the sheets called S; 
and S> doesn’t affect o;, but it interchanges the two other transpositions, so with suitable 
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numbering of the sheets, 0, = (12) and o2 = (23). There is just one isomorphism class of 
such coverings. 

The Riemann Existence theorem tells us that there is, up to isomorphism, a unique 
field extension K of F with this covering as its Riemann surface. Of course K will depend 
on the location of the two branch points but they can be moved to any position by a linear 
change of variable in f. 

How do we find a polynomial f(t, x) whose Riemann surface has this form? There is 
no general method, so one has to guess, and this case is simple enough that it can be guessed 
fairly easily. Since there is very minimal branching, we look for a very simple polynomial 
that is cubic in x. It takes a bit of courage to start looking, but one of the first attempts might 
be a polynomial of the form x? + x + t. This will work, but let’s take f(t, x) = LPS 3K 
instead. Then a =3%?=3 and a = 1. Substituting the roots x = +1 of mM into f, one finds 


that the branch points are the points ¢ = +2. Since ne is nowhere zero, Proposition 15.9.13 
applies. 

There is a double root at the point p; = (2, -1). So a; contains 2-cycle, a transposition. 
Similarly, 02 is a transposition. So apart from the location of the two branch points, the 
Riemann surface X of the polynomial f = x — 3x +t has the desired properties, and 
F[x]/(f ) defines the field extension with that branching. 


(c) ft, x)= —-P 40, f =3x7, Lf = -32 +21. 


Here X is a three-sheeted covering of 7. The branch points are at t = 0 and ¢t = 1, and 
both f(0, x) and f(1, x) have triple roots. Let o9 and o; denote the permutations of the 
sheets at the branch points. The partial derivative of is not zero at f = 1, so the three sheets 
are permuted cyclically there. With suitable numbering, 0; will be (123). 


The point t = 0 presents problems. First, _ vanishes there. Second, how can we make 
sure to use the same numbering of the sheets at the two points? In the previous example, 
knowing that the Riemann surface must be path connected was enough to determine the 
branching. This fact gives us no information here because o; operates transitively on the 
sheets by itself. 

We use a trick that works only in the simplest cases. That is to compute the permutation 
that we get by walking around a large circle I’. A large circular path will cross each of the 
cuts once (see Figure 15.9.8), so the sheets will be permuted by the product permutation 
000}, or by 0100, depending on where we start. If we can determine that permutation, then 
since we know 0}, we will be able to recover oo. 

The substitution t = u~! maps T bijectively to the complex u-plane U, except that it 
is undefined at the points ¢ = 0 and u = 0. Because u > 0 as t > 00, the point u = 0 of U is 
called the point at infinity of T. Our large circle [in T corresponds to a small circle, we’ll call 
it L, that circles the origin in U. However, a counterclockwise walk around I" corresponds 
to a clockwise walk around L: If t = re’, then u = r—!e~"?. 

We make the substitution t = u~! into the polynomial f = x7 — 2 + f and clear 
denominators, obtaining x°u> — 1 + u. When analyzing such a substitution, one usually has 
to substitute for x as well. It seems clear here that we should set y = ux. This gives us 


y — 1 
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Let’s call this polynomial g(u, y). The Riemann surfaces X and Y: {g = 0} correspond via 
the substitution (x, ft) < (y, u), which is defined and invertible except above the origins in 
the planes T and U. Therefore the permutation of sheets of X defined by a counterclockwise 
walk around I" will be the same as the permutation of sheets of Y defined by a clockwise 
walk around L. That permutation is trivial, because the Riemann surface Y is not branched 
at u = 0. Therefore 090; = 1, and since o; = (123), op = (321). 0 


15.10 THE FUNDAMENTAL THEOREM OF ALGEBRA 


A field F' is algebraically closed if every polynomial of positive degree with coefficients in 
F has a root in F’. The Fundamental Theorem of Algebra asserts that the field of complex 
numbers is algebraically closed. 


Theorem 15.10.1 Fundamental Theorem of Algebra. Every nonconstant polynomial with 
complex coefficients has a complex root. 


There are several proofs of this theorem, and one of them is particularly appealing. 
We present it in outline. We must prove that a nonconstant polynomial 


(5.10.2) IQ) =x 44,200 ae ae 


with complex coefficients has a complex root. If ag = 0, then 0 is a root, so we may assume 


that ap 40. 

The rule y = f(x) defines a function from the complex x-plane to the complex y-plane. 
Let C; denote a circle of radius r about the origin in the complex x-plane, parametrized as 
x = re’®, with 0 < 0 < 27. We inspect the image f(C,) of C,. 

To warm up, we consider the function defined by the polynomial y = x” = Pehl oks 
6 runs from 0 to 271, the point x travels once around the circle of radius r. At the same time, 
n@ runs from 0 to 27. The point y winds n times around the circle of radius r”. 

Now let f be the polynomial (15.10.2). For sufficiently large r, x” is the dominant term 
in f(x). To make this precise, let @ be the maximum absolute value of the coefficients a; of 
f. Then if |x| =r > 10nM, 


| Fe) — x" | = lan ax") +--+ ax +a] <nM|xI"™ < yor". 


It follows from this inequality that, as 9 runs from 0 to 27 and x” winds n times around 
the circle of radius r”, f(x) also winds around the origin n times. A good way to visualize 
this conclusion is with the dog-on-a-leash model. If someone walks a dog 1 times around a 
large circular path, the dog also goes around n times, though perhaps following a different 
path. This will be true provided that the leash is shorter than the radius of the path. Here x” 
represents the position of the person at the time 6, and f(x) represents the position of the 
dog. The radius of the path is -” and the length of the leash is iol”: 

We vary the radius r. Since f is a continuous function, the image F(C,) will vary 
continuously with r. When the radius r is very small, f(C,) makes a small loop around the 
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constant term ag of f. This small loop won’t wind around the origin at all. But as we just 
saw, f(C,) winds n times around the origin if r is large enough. The only explanation for 
this is that for some intermediate radius r’, f(C,) passes through the origin. This means 
that for some point a@ on the circle C,, f(~) = 0. Then @ is a root of f. 


| don’t consider this algebra, 
but this doesn’t mean that algebraists can’t do it. 


—Garrett Birkhoff 


EXERCISES 


Section 1 Examples of Fields 
1.1. Let R be an integral domain that contains a field F as subring and that is finite-dimensional 
when viewed as vector space over F’. Prove that R is a field. 


1.2. Let F be a field, not of characteristic 2, and let x? + bx + c = 0 be a quadratic equation 
with coefficients in F. Prove that if 5 is an element of F such that 62 = b* — 
x = (-b + 5)/2 solves the quadratic equation in F’. Prove also that if the discriminant 
b? — 4c is not a square, the polynomial has no root in F. 


1.3. Which subfields of C are dense subsets of C? 


Section 2 Algebraic and Transcendental Elements 
2.1. Let @ be a complex root of the polynomial x* — 3x + 4. Find the inverse of a? + a@ +1 in 
the form a + ba + ca”, with a, b, cin Q. 


2.2. Let f(x) = x" — an_x"~! +---+ag be an irreducible polynomial over F, and let a be 
a root of f in an extension field K. Determine the element a! explicitly in terms of a 
and of the coefficients a;. 


2.3. Let B = w+/2, where w = e?7*/3 andlet K = Q(8). Prove that the equation Xie xt = 
-1 has no solution with x; in K. 


Section3 The Degree of a Field Extension 


3.1. LerPbea _— and let a@ be an element that generates a field extension of F of degree 5. 
Prove that a? generates the same extension. 


3.2. Prove that the polynomial x‘ + 3x + 3 is irreducible over the field Q[{ 2]. 
3.3. Let f, = e?*'/", Prove that C5 ¢ Q(C7). 


3.4, Let ¢, = e*™/", Determine the irreducible polynomial over Q and over Q(¢3) of 
(a) f4, (b) %6, (©) Ss, (d) So, (©) S10, Cf) S12. 


3.5. Determine the values of n such that ¢, has degree at most 3 over Q. 


3.6. 


ST. 
3.8. 


3.9. 


3.10. 
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et ei a positive rational number that is not a square in Q. Prove that /a has degree 4 
over Q. 


(a) Isi in the field Q(¥Y-2)? (b) Is J in the field Q(</2)? 
Let a and be complex numbers. Prove that if w + 6 and af are algebraic numbers, 
then @ and £ are also algebraic numbers. 


Let a and B be complex roots of irreducible polynomials f(x) and g(x) in Q{[x]. Let 
K = Q(q@) and L = Q(). Prove that f(x) is irreducible in L{x] if and only if g(x) is 
irreducible in K[x]. 


A field extension K/F is an algebraic extension if every element of K is algebraic 
over F. Let K/F and L/K be algebraic field extensions. Prove that L /F is an algebraic 
extension. 


Section 4 Finding the Irreducible Polynomial 


4.1. 


4.2. 


4.3. 


Let K = Q(qa), where a is a root of x? — x — 1. Determine the irreducible polynomial for 
y =1+ 4a? over Q. 


Determine the irreducible polynomial for a = /3 + /5 over the following fields. 
(aQ, (b) Q V5), (©) QvI0), (@) Q(v'15). 


With reference to Example 15.4.4(b), determine the irreducible polynomial for y = 
ay + a2 over Q. 


Section 5 Constructions with Ruler and Compass 


5.1. 
5.2. 


§.3. 
5.4. 
5.5. 


5.6. 


Express cos 15° in terms of real square roots. 

Prove that the regular pentagon can be constructed by ruler and compass 

(a) by field theory, (b) by finding an explicit construction. 

Decide whether or not the regular 9-gon is constructible by ruler and compass. 
Is it possible to construct a square whose area is equal to that of a given triangle? 


Referring to the proof of Proposition 15.5.5, suppose that the discriminant D is negative. 
Determine the line that appears at the end of the proof geometrically. 


Thinking of the plane as the complex plane, describe the set of constructible points as 
complex numbers. 


Section6 Adjoining Roots 


6.1. 


6.2. 


Let F be a field of characteristic zero, let f’ denote the derivative of a polynomial /f in 

F[x], and let g be an irreducible polynomial that is a common divisor of f and f’. Prove 

that 2” divides f. 

(a) Let F be a field of characteristic zero. Determine all square roots of elements of F 
that a quadratic extension of the form F' (./a) contains. 

(b) Classify quadratic extensions of Q. 
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6.3. 


Determine the quadratic number fields Q[d] that contain a primitive nth root. of unity, 
for some integer n. 


Section 7 Finite Fields 


Tek, 
7.2. 
Te. 
7.4. 
7.56 
7.6. 
Tene 
7.8. 


i 


*7.10. 


7 


7.12. 


Tale 
7.14, 


Identify the group Fj. 

Determine the irreducible polynomial of each of the elements of Fg in the list 15.7.8 
Find a 13th root of 2 in the field Fj3. 

Determine the number of irreducible polynomials of degree 3 over F3 and over Fs. 
Factor x? — x and x2’ — x in F3. 

Factor the polynomial x6 _ x over the fields F4 and Fg. 

Let K be a finite field. Prove that the product of the nonzero elements of K is ~1. 

The polynomials f(x) = x? + x + land g(x) = x? + x? +1 are irreducible over F. Let 
K be the field extension obtained by adjoining a root of f, and let L be the extension 


obtained by adjoining a root of g. Describe explicitly an isomorphism from K to L, and 
determine the number of such isomorphisms. 


Work this problem without appealing to Theorem (15.7.3). Let F = Fp. 


(a) Determine the number of monic irreducible polynomials of degree 2 in F[x]. 


(b) Let f(x) be an irreducible polynomial of degree 2 in F[x]. Prove that K = F[x]/(f) 
is a field of order p*, and that its elements have the form a + ba, where a and b are 
in F and @ is a root of f in K. Moreover, every such element with b+0 is the root 
of an irreducible quadratic polynomial in F[x]. 


(c) Show that every polynomial of degree 2 in F[x] has a root in K. 
(d) Show that all the fields K constructed as above for a given prime p are isomorphic. 


Let F be a finite field, and let f(x) be a nonconstant polynomial whose derivative is the 
zero polynomial. Prove that f cannot be irreducible over F. 


Let f = ax’? +bx+cwitha, b, c ina ring R. Show that the ideal of the polynomial ring 
eA that is generated by f and f’ contains the discriminant, the constant polynomial 
b* — 4ac. 


Let p be a prime integer, and let g = p” and g' = p*. For which values of r and k does 
x7 — x divide x7 — x in Z[x]? 
Prove that a finite subgroup of the multiplicative group of any field F is a cyclic group. 


Find a formula in terms of the Euler ¢ function for the number of irreducible polynomials 
of degree n over the field F p. 


Section 8 Primitive Elements 


8.1. 
8.2. 


Prove that every finite extension of a finite field has a primitive element. 


Determine all primitive elements for the extension K = Q(/2, V3) of Q. 
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Section9 Function Fields 


9.1. Let f(x) be a polynomial with coefficients in a field F. Prove that if there is a rational 
function r(x) such that 7? = f, then risa polynomial. 


) 9.2. Determine the branch points and the gluing data for the Riemann surfaces of the 
following polynomials. 


(a)x?-2 +1, (b) x*-t-1, (c) x3 —3tx—4t, @) 3 — 3x2 -1, 
; (e) x? — t¢-1), @) x? —3ex2 +2, x4 + 4x42, Gh) xe —3rx—2-2. 


9.3. (a) Determine the number of isomorphism classes of function fields K of degree 3 over 
F = C(t) that are ramified only at the points 1 and -1. 


(b) Describe gluing data for the Riemann surface corresponding to each isomorphism 
class of fields as a pair of permutations. 


(c) For each isomorphism class, find a polynomial f(7,x) such that K = F[t}/(f) 
represents the isomorphism class. 
*9.4. Prove the Riemann Existence Theorem for quadratic extensions. 
Hint: Show that up to isomorphism, a quadratic extension of F is described by the finite 
set {p1,..., px} of its true branch points. 


*9.5, Write a computer program that determines the branch points p, and the permutations 
oy for the Riemann surface of a given polynomial. 


Section 10 The Fundamental Theorem of Algebra 


10.1. Prove that the subset of C consisting of the algebraic numbers is algebraically closed. 
10.2. Construct an algebraically closed field that contains the prime field Fp. 


*10.3. With notation as at the end of the section, a comparison of the images f(C,) for varying 
radii shows another interesting geometric feature: For large r, the curve f(C,) makes n 
loops around the origin. Its total curvature is 277m. Assuming that the coefficient a) is not 
zero, the linear term a;z + ap dominates f(z) for small z. Then for small r, f(C,-) makes 
a single loop around ap. Its total curvature is only 277. Something happens to the loops as 
r varies. Explain. 


*10.4. Write a computer program to illustrate the variation of f(C;) with r. 


Miscellaneous Exercises 
M.1. Let K = F(a) be a field extension generated by a transcendental element q, and let B 
be an element of K that is not in F. Prove that @ is algebraic over the field F(£). 
M.2. Factor x’ + x + 1in Fy[x]. 


*M.3. Let f(x) be an irreducible polynomial of degree 6 over a field F’, and let K be a quadratic 
extension of F. What can be said about the degrees of the irreducible factors of f in 


K[x}? 
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M.4. 


*M.S. 


*M.6. 


*Mie7. 


(a) Let p be an odd prime. Prove that exactly half of the elements of ES are squares an | 


that if a and # are nonsquares, then af is a square. 

(b) Prove the same assertion for any finite field of odd order. 

(c) Prove that in a finite field of even order, every element is a square. 

(d) Prove that the irreducible polynomial for y = J/2 + V3 over Q is reducible modulo 
p for every prime p. 

Prove that any element of GL2(Z) of finite order has order 1, 2, 3, 4, or 6 

(a) by using field theory. 

(b) by applying the Crystallographic Restriction. 


(a) Prove that a rational function f(t) that generates the field C(¢) of all rational 
functions defines a bijective map T’ + T’. 


(b) Prove a rational function f(x) generates the field of rational functions C(x) if and 
only if it is of the form (ax + b)/(cx + d), with ad — bc #0. 


(c) Identify the group of automorphisms of C(x) that are the identity on C. 


Prove that the homomorphism SL2(Z) — SL2(F p) obtained by reducing the matrix 
entries modulo p is surjective. 
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311-314, 318, 322-325, 328, 
419-420, 422-428, 430, 437, 439, 
442-443, 445-446 

augmented, 12-13, 16 

coefficient, 35-36, 129, 139, 208, 247 

column, 1-4, 8-9, 12-16, 18-20, 24, 
26-28, 30-31, 33, 35-36, 75, 83, 87, 
91, 100, 107-108, 111, 113, 126-127, 
133-136, 152, 184, 234-236, 257, 
286, 290, 295, 423-427, 430, 443 

defined, 2-3, 5-6, 13, 19, 23-24, 26, 34, 
36, 39, 52, 72-75, 87, 110, 120, 123, 
126-127, 129, 133, 136, 152-154, 
159, 208, 228, 233, 235, 240, 244, 
247, 250, 256-258, 264, 269, 273, 
277-278, 286, 288-291, 293-294, 
312-313, 324, 419, 430, 439, 442 

equations, 4, 6, 12-16, 32-33, 
35-36, 72, 83-84, 91, 100, 120, 152, 


479 


223-225, 247, 268, 284-285, 287, 
289, 423, 443 
equivalence, 52, 72, 76, 269 
identity, 6, 8, 10-11, 15-16, 20, 22-23, 
39-40, 43, 47, 50-52, 62, 70, 73-76, 
107, 120, 127, 130, 132, 134, 137, 
184-185, 199, 211, 225, 233, 235, 
237, 268, 273, 277-279, 281-282, 
288, 290, 295, 313, 324, 328, 
422-423, 442 
multiplying, 2-3, 8, 10, 12, 15, 21, 47, 
423, 426 
notation, 4-5, 10, 20-21, 23-24, 28, 37, 
40-41, 43, 47, 72, 74, 76, 87, 108, 
120, 123, 127, 133-134, 140, 150, 
199, 232, 235, 268-270, 273, 286, 
293, 296, 312-314, 427 
row, 1-4, 8, 10-16, 19-24, 27-28, 30- 
35, 71, 77, 100, 102, 107, 110-111, 
127, 133- 134, 257, 423-426, 430, 
437, 439, 443 
scalar multiplication, 2, 5, 87, 91, 235, 
240 
square, 2, 6-8, 10, 15-16, 18, 24, 
32-35, 101, 111, 130, 134, 154, 192, 
235, 250, 259-260, 276, 283, 298, 
419-420, 424 
zero, 6-8, 11, 13-16, 21-24, 26-27, 
29-31, 75, 84, 91, 100-102, 110-111, 
123-124, 126-127, 130, 132, 134, 
139, 141, 236, 240, 244, 250, 269, 
271, 273, 275, 277-278, 281-282, 
288, 290-291, 295, 299, 311-314, 
324, 328, 419-420, 422-423, 425, 
427, 430, 439, 442 
Matrix, 1-24, 26-36, 37-40, 43, 47, 
51,71, 73, 75-77, 79-81, 83-84, 
87-89, 91, 94-96, 99-102, 103-132, 
133-154, 162, 164, 184, 189, 
192-193, 199, 208, 223, 231-248, 
250, 252-262, 263-264, 268-271, 
273-281, 283-291, 293-296, 298 
302, 305-307, 311-315, 318-319, 
321-324, 383, 386, 415, 419-431, 
434-435, 437-446, 481 
Matrix equations, 6 
Maximum, 36, 150, 378, 476 
Mean, 9, 54, 61, 98, 112, 179, 201, 217, 
232, 328-329, 358, 361, 370, 395, 
469, 471, 477 
defined, 54, 98, 179, 217, 329, 358, 
361, 469 
quadratic, 395, 477 
Means, 6, 19-20, 24-25, 38-39, 48-49, 
51, 53, 55-56, 60-61, 83, 92, 112, 
124, 128, 178, 182, 200, 202, 236— 
238, 245, 254, 274, 296, 300, 311, 
338-339, 341, 350, 353, 355, 360, 
393, 401, 409, 422, 428, 431, 436, 
438, 454, 461, 466, 469, 471, 477 
Measures, 4, 402 
Midpoint, 393 
Minimum, 392, 402 
Models, 37, 79 
Monomials, 98, 262, 329-332, 336, 350, 
422, 439, 454-455 
coefficient of, 329, 331 
degree of, 329, 331, 454-455 


Multiples, 43, 66, 80, 86, 88, 130, 153, 
166, 170, 181, 184, 199, 277, 335, 
337, 365-366, 376, 426 

common, 80, 86, 181, 366 

Multiplication, 2, 4-10, 12, 18, 21-23, 
26, 31-33, 35, 37-42, 46, 57-58, 
61, 64-65, 67-69, 79, 81-82, 
85-87, 91, 96, 100, 103, 105-106, 
108-110, 112-114, 127-131, 
137-139, 141-142, 144, 147-148, 
154, 177-178, 180, 183, 192-193, 
197-198, 207-209, 211-214, 218, 
223, 235, 240, 246, 258, 261-262, 
270, 279-281, 287, 290, 297, 301, 
308, 311, 314, 321-325, 327-328, 
330, 332-335, 339-340, 343, 345, 
347, 350, 358, 361-362, 391-395, 
400-401, 411, 413, 417-418, 420— 
421, 423, 426, 428, 430, 438-443, 
445-446, 451, 460 

of fractions, 347, 350, 361 
of integers, 41, 46, 61, 81-82, 334-335, 
339, 391, 393, 395, 400, 413, 417, 
420 
Multiplicative inverses, 82, 328-329 


N 
n factorial, 42 
Natural numbers, 55, 69 
Negative exponents, 360 
Notation, 4—5, 10, 17, 20-21, 23-25, 28, 
37-38, 40-41, 43, 46-49, 53-55, 57, 
$9_60, 65-67, 72, 74, 76, 82, 87, 
94-95, 105, 108, 120-121, 123, 127, 
133-134, 140, 142, 149-150, 158, 
162, 165, 177, 179, 181, 198-199, 
209, 213, 217, 232, 235, 242, 248, 
268-270, 273, 286, 293, 296, 310, 
312-315, 330-333, 336, 343, 360, 
372-373, 385, 387, 393, 395, 399, 
408, 410, 418, 427, 438, 447, 449, 
464, 466, 469, 480 
delta, 134 
exponential, 48, 95, 149-150 
interval, 149, 269 
limit, 142 
set, 17, 24-25, 37-38, 40-41, 43, 
46-47, 49, 53-55, 59-60, 65-67, 72, 
74, 76, 82, 87, 105, 121, 123, 127, 
134, 177, 179, 181, 198, 209, 213, 
217, 268, 296, 310, 330, 332, 336, 
343, 360, 385, 410, 418, 427, 438, 
464, 469, 480 
sigma, 4-5 
summation, 4—5, 20, 28, 314 
nth power, 223 
nth root, 70, 479 
Numbers, 1—2, 5, 19, 33, 37, 40-41, 
43,51, 55, 61-62, 66, 69, 72, 74, 
81-82, 86-87, 97, 99-100, 102, 
117, 124, 126, 128, 139, 146, 149, 
153, 166, 172, 177, 187, 198, 226, 
231, 234, 236-237, 242, 246-247, 
250, 258, 262, 269, 273-275, 
289-290, 313, 320, 327-329, 
335-336, 342-343, 350-351, 358, 
360, 362, 364, 371, 373, 384, 387, 


389, 395, 400, 406-407, 422, 431, 
440, 442-443, 446, 447-448, 450, 
454, 457-461, 473, 476, 478, 480 

positive, 1, 19, 41, 51, 62, 69, 74, 102, 
117, 124, 128, 149, 166, 231, 234, 
236-237, 242, 250, 258, 262, 275, 
289, 327, 343, 358, 371, 373, 387, 
395, 406-407, 458, 460-461, 476, 
478 

prime, 61, 72, 82, 86, 100, 102, 226, 
364, 373, 406, 440, 447, 454, 460, 
480 

rational, 81, 99, 177, 327, 350-351, 
358, 371, 373, 387, 395, 440, 446, 
447-448, 454, 461, 478, 480 

real, 2, 19, 33, 37, 40-41, 43, 51, 
62, 66, 69, 72, 74, 81-82, 86-87, 
97, 102, 117, 128, 146, 149, 166, 
172, 177, 231, 234, 236-237, 242, 
246-247, 250, 258, 262, 269, 
273-275, 289-290, 327-329, 336, 
342-343, 358, 360, 362, 371, 387, 
389, 395, 406-407, 442, 448, 450, 
457-461, 478 

whole, 51, 86, 198, 273, 336 


oO 
Ordered pair, 184 
Origin, 80, 105, 136-137, 157-164, 170- 
172, 176, 179-180, 189-190, 251, 
262, 275, 290, 293, 323, 349-350, 
355, 408-409, 473-477, 480 
coordinate system, 159-160, 163, 176 
symmetry, 157, 159, 161, 163, 171, 
179, 189-190, 293 
Orthogonal vectors, 134, 254, 256, 294 


P 
Parabola, 248-249 
equation of, 248-249 
Parallel lines, 162 
vectors, 162 
Parallelogram law, 80, 113, 258 
Parallelograms, 428 
Partial fractions, 382-383 
defined, 383 
Paths, 76, 102, 106, 252, 277, 472 
definition of, 277 
Patterns, 173-175, 190, 339 
wallpaper, 173 
Pentagons, 188 
Permutations, 24—27, 29, 34, 41-42, 49- 
50, 58, 62-63, 179, 183, 193-194, 
203, 215, 218, 220-222, 225, 229, 
299, 419, 472-475, 480 
defined, 24, 26, 34, 63, 179, 193-194, 
220, 222, 229, 419, 473 
Plane, 33, 36, 43, 53, 72-73, 80, 105, 109, 
113, 136, 154, 155-157, 160-164, 
166-180, 183-184, 187-191, 194— 
195, 223, 228, 244, 248, 251, 259, 
264-265, 270-271, 275, 277, 279, 
287, 297, 327, 349, 351, 354-355, 
357-358, 361-362, 365, 387, 
389, 391-393, 400, 406, 408-411, 
415, 443, 455, 457, 469-471, 473, 
475-476, 478 


480 


Point, 2, 14, 18, 22, 27, 36, 45, 52, 54, 


73, 76, 79, 118, 136, 159, 161-164, 
167-174, 176, 178-180, 182, 185, 
188-191, 194-195, 200-201, 209, 
216, 224, 247, 251-252, 265-269, 
271-273, 275, 277, 280, 282, 286, 
297, 329, 344, 349-351, 353, 355, 
357, 362, 365, 386, 392, 402, 406, 
408-409, 415, 422, 427, 430, 441, 
451, 455-458, 461, 470-477 


Points, 17, 21, 31, 35-36, 43, 63, 75-76, 


134, 157, 159, 161, 164, 167, 170, | 
172, 174, 176, 178-180, 185, 

188, 190-191, 194, 251-252, 257, 

260, 265-267, 270-272, 279, 281, 

286, 290, 316, 327, 340, 349, 351, 

354-355, 357, 361-362, 393, 400, 

408-411, 415, 427-428, 455-458, 

462, 469-475, 478, 480 


Polygons, 188 
regular, 188 
Polynomial, 17, 35, 99, 102, 103, 


114-122, 125—126, 128-130, 132, 

136, 139-141, 152, 199, 247-251, 
256, 259, 262, 269, 284, 289, 302, 
315, 327-334, 336-338, 341-355, 
357-362, 364-365, 367, 369-379, 
381, 383-386, 387-390, 396, 399, 
404-405, 412, 414-415, 417, 419, 
422-423, 433-434, 437-443, 445- 
446, 447-455, 459-470, 473-481 


Polynomial equations, 284, 289, 352 
Polynomial functions, 103, 140, 330, 462 
Polynomials, 29, 86, 99, 102, 128, 


139-140, 199, 257, 262, 315, 
328-334, 336-338, 342-344, 346, 
348-354, 357, 360-362, 363, 367, 
369-379, 383-384, 386, 387, 415, 
422, 433-434, 437-441, 443-444, 
448-450, 455, 461-462, 464-465, 
467-470, 474, 478-480 

addition of, 86, 330, 333, 343 

defined, 86, 257, 315, 329-330, 333— 
334, 336, 348, 350, 354, 360-362, 
383, 438-439, 448-449, 469-470 

degree of, 329, 331, 353, 357, 371, 376, 
386, 434, 448, 450, 455, 461, 467 

factoring, 363, 367, 369-372, 374-379, 
384, 386 

in one variable, 332, 337-338, 349, 
353—354, 369-370, 375, 437 

multiplying, 346, 350, 354, 438 

prime, 86, 102, 199, 349, 363, 367, 
369-370, 372-378, 383, 386, 415, 
438, 440, 444, 462, 464-465, 467, 
479-480 

quadratic, 102, 354, 362, 371, 379, 387, 
415, 450, 478-480 


Positive integers, 1, 45, 70, 76, 228, 358, 


367, 373, 390, 444 


Power, 36, 40, 46, 72-73, 123, 125, 130, 


150, 166, 197, 199, 204-207, 209, 
213, 223, 282-283, 302, 305-307, 
324, 358-359, 361, 378, 383, 434, 
436-437, 458-460, 464-467 

defined, 36, 72~73, 123, 197, 302, 307, 
324, 358-359, 361, 383 


Power series, 36, 358-359, 36}, 383 

Powers, 40, 46-47, 64, 75, 83, 85, 120, 
125, 132, 150, 165, 199, 302, 307, 
327, 329, 360, 378, 386, 390, 434, 
438, 454-455, 467 

Prime factorization, 369, 380 

Prime notation, 158 

Prime polynomials, 440 

Principal, 335-338, 341-346, 349, 
353-354, 359-362, 363-370, 374, 
381, 383-384, 386, 387, 391-401, 

_ 403-406, 413-415, 431-433, 437, 

440, 448, 461 

Product, 2-5, 7-10, 15-16, 23-27, 30-35, 
37, 39-40, 42-43, 45, 47-48, 51, 60, 
64-68, 70-71, 73~—77, 85, 87-88, 
94, 99-100, 111, 116-118, 127, 131, 
133, 135-136, 142-145, 147-148, 
150-152, 158-159, 161, 166, 168, 
189, 191, 193, 200, 204, 207-208, 
210-211, 213-216, 218, 231, 
233-235, 237, 239, 241-244, 252, 
261-262, 263, 268, 273, 276, 278, 
286-288, 302-304, 312, 316, 318, 
323, 330-331, 338-339, 343-347, 
349, 354, 359-362, 363-364, 
367-370, 372-375, 377-380, 386, 
387, 394-399, 401, 403-405, 413, 
420, 436, 440, 442-444, 446, 454, 
460, 463, 468, 475, 479 

Product Rule, 142-143, 145, 148, 150, 
152, 278, 359, 443, 463 

for differentiation, 145, 152, 443, 463 
Pythagoras, 134, 460 


Q 
Quadratic, 102, 247-251, 260, 354, 
362, 371, 379, 387-398, 400, 402, 
404-410, 412-415, 447, 450-452, 
457-458, 477-480 
Quadratic equations, 247, 457-458 
defined, 247, 457 
Quadratic formula, 452 
discriminant, 452 
Quadratic polynomials, 102, 354, 362 
defined, 354, 362 
Quaternions, 290 
Quotient, 66-69, 74-75, 179, 189, 214, 
216-217, 227, 272, 275, 282, 319, 
338-344, 346, 351, 359-361, 381, 
396-397, 399, 413, 418, 428, 433, 
442-443, 446, 461 
functions, 339, 351, 361 
real numbers, 66, 69, 74, 275, 342, 
360, 442 
Quotient Rule, 443 
Quotients, 395-396, 433 


R 

Range, 39, 44, 46-49, 109, 111, 153, 
207, 334, 365, 378, 385, 415, 418, 
425, 466 

defined, 39, 48, 153, 334 

Ratio, 150, 393, 428 

Ratio test, 150 

Rational functions, 348, 350-351, 353, 
374, 383, 447, 449, 468, 470, 481 


domain, 353, 374, 383 
Rational numbers, 81, 99, 327, 358, 373, 
395, 446, 447-448, 454, 461 
Ratios, 449 
Ray, 290 
defined, 290 
Rays, 290 
Real axis, 177, 275, 395, 470 
Real numbers, 2, 19, 37, 40-41, 51, 62, 
66, 69, 72, 74, 81-82, 86, 97, 102, 
128, 149, 166, 172, 177, 231, 236— 
237, 246-247, 250, 273-275, 290, 
329, 336, 342, 360, 362, 406-407, 
442, 448, 450, 457-460 
absolute value, 69, 166, 275 
complex, 2, 41, 81-82, 86, 128, 149, 
177, 236-237, 274-275, 342, 362, 
406, 442, 448, 450 
defined, 2, 19, 66, 69, 72, 74, 86, 149, 
231, 247, 250, 273-274, 290, 329, 
336, 360, 362, 407, 442, 448, 457 
imaginary, 149, 407, 450 
integers, 41, 66, 81-82, 102, 329, 342, 
406-407 
properties of, 40, 69, 81-82, 236 
rational, 81, 177, 448 
real, 2, 19, 37, 40-41, 51, 62, 66, 69, 
72, 74, 81-82, 86, 97, 102, 128, 
149, 166, 172, 177, 231, 236-237, 
246-247, 250, 273-275, 290, 329, 
336, 342, 360, 362, 406-407, 442, 
448, 450, 457-460 
Rectangle, 170, 392-393 
Rectangles, 400 
similar, 400 
Reflection, 63, 135-137, 139, 160-161, 
163-166, 168, 172, 174, 176-177, 
184, 189, 214, 223, 258, 293, 323 
defined, 63, 136, 164, 258, 293 
in calculus, 164 
Regular polygons, 188 
Relations, 7, 42, 52-53, 55, 72, 77, 95, 
119, 153, 160, 165, 182, 197, 206— 
208, 210-212, 214-222, 227-228, 
264, 268, 273, 293, 303, 309, 
312-314, 319, 321, 323, 341-342, 
348, 360, 381, 402, 404-406, 
428-431, 435-437, 439-441, 
443-444, 464-465 
defined, 52-53, 55, 72, 153, 197, 208, 
216-217, 220, 222, 228, 264, 273, 
293, 312-313, 321, 348, 360, 430, 
439 
Remainder, 44-45, 61, 73, 331-332, 
334, 336-338, 343-344, 353, 360, 
363-366, 370-371, 377, 382-383, 
386, 425, 462 
Remainder theorem, 73, 360, 382, 386 
Right triangles, 460 
Roots, 17, 70, 75, 85, 116-119, 121, 
126, 139-141, 302, 306, 344, 346, 
353-355, 357, 371, 376, 378-379, 
383-384, 388, 390, 396, 443, 450, 
454-455, 461-469, 473-475, 478 
cube root, 450 
nth root, 70 
of the equation, 353-354 


431 


of unity, 70, 75, 302, 306, 378 

Rotations, 133-134, 136, 138-139, 151, 
154, 160-162, 166, 169, 172-174, 
181-182, 184-187, 190, 194, 200, 
215, 264, 272, 286-287, 294-295, 
306, 309, 320 

Row operations, 10-16, 35, 100, 424 

Run, 5, 25, 35, 60-61, 83, 168, 201, 262, 
269-270, 297, 382 


S 
Sample, 175 
Scalar multiplication, 2, 5, 79, 85-87, 91, 
96, 103, 142, 235, 240, 350, 417-418, 
438, 451 
matrices, 2, 5, 87, 91, 235, 240 
vectors, 2, 79, 86-87, 91, 96, 235, 240, 
350, 417 
Scalars, 2, 20, 80-81, 86-87, 92, 112, 
115, 122, 129, 133, 143, 158, 231, 
251-252, 256, 322, 421, 439 
Second quadrant, 128 
Semicircle, 272 
Sequences, 98, 102, 362 
defined, 98, 362 
finite, 98, 102 
infinite, 98, 102 
Series, 36, 98, 146-147, 149-151, 153, 
275, 280, 285, 358-359, 361, 383 
defined, 36, 98, 149, 153, 358-359, 
361, 383 
mean, 98, 358, 361 
Sets, 5, 25, 48, 53-55, 57, 59-60, 63-64, 
69, 73, 87-88, 94, 98, 133, 147, 
169, 178, 191-193, 203, 210, 216, 
222, 224, 252, 264, 280, 295, 321, 
334, 351, 355, 358, 361, 403, 411, 
420, 469 
empty, 25, 55 
intersection, 210, 252, 361 
solution, 73, 88 
union, 53-54, 351, 361 
Sides, 6, 15, 41, 48, 65, 83, 114, 120, 
133, 135, 153, 158, 168, 186, 228, 
276, 353-354, 367, 373, 398, 471 
Signs, 20, 28-30, 465 
Simplification, 111 
Simplify, 7-8, 12, 14, 17, 21, 60, 123, 
140, 162, 186, 248, 302, 315, 423, 
431, 454 
complex number, 302 
defined, 60, 123, 302, 315 
Sine, 105, 270 
Slope, 286 
Solutions, 12-16, 32, 72, 80, 83-84, 
86-87, 89, 92-93, 95, 99-101, 105, 
115, 126, 142, 144-146, 148-149, 
152, 154, 199, 230, 274-275, 284, 
287-289, 351-352, 382, 415, 417, 
423, 425-426, 437, 443, 458 
of an equation, 230 
Speed, 233 
Spheres, 265-268, 286-287, 316 
surface area of, 316 
volume of, 316 
Square, 2, 6-8, 10, 15-16, 18, 24, 
32-35, 63, 94, 101, 109, 111-112, 


125, 130, 134, 154, 170, 174, 
191-194, 235, 242, 249-250, 
259-260, 276, 283, 298, 305, 
327, 344, 346, 355, 365, 380-382, 
385-386, 387-389, 398-399, 403, 
406-407, 419-420, 424, 451-452, 
460, 477-478, 481 
matrix, 2, 6-8, 10, 15-16, 18, 24, 
32-35, 94,101, 109, 111-112, 125, 
130, 134, 154, 192-193, 235, 242, 
250, 259-260, 276, 283, 298, 305, 
386, 419-420, 424, 481 
Square roots, 344, 346, 355, 478 
Squares, 134, 170, 188, 249, 283, 
380-382, 428, 481 
Standard form, 330 
Statements, 45, 125, 232, 269, 346, 381 
defined, 269 
Subset, 42-44, 47, 53-57, 63-64, 69, 
72-73, 75-76, 79, 81-82, 86, 90, 
92-93, 99, 102, 103, 157, 166, 172, 
176, 178, 182, 193, 195, 208-209, 
214, 217, 227, 229, 277, 280-281, 
285, 327-328, 335, 350-351, 355, 
357, 361, 391, 411, 418, 434, 444, 
469, 480 
Substitution, 36, 249, 297, 301, 312, 
333-334, 338, 344, 349, 351, 384; 
387, 422, 448, 467, 475-476 
Subtraction, 61, 81, 327-328, 460 
of integers, 61, 81 
Sum, 2, 4-5, 9, 26-27, 29-30, 35, 37, 60, 
64, 70, 73, 84-85, 96-97, 101, 109, 
112, 116-117, 123-124, 127-129, 
132, 134, 150, 158, 162, 172, 201, 
219, 239-240, 253, 256, 258, 276, 
278, 298-304, 308, 310, 313-314, 
317-318, 324-325, 330-331, 346, 
359, 380-381, 383, 419, 424, 431, 
434-440, 444, 446, 466 
derivative of, 150, 278, 359, 466 
Summation notation, 5, 20, 28 
defined, 5 
index of summation, 5 
Sums, 29, 96, 101, 153, 203, 240, 306, 
312, 323, 327, 340, 346, 348, 359, 
394, 435, 453 
Surface area, 316 
Symbols, 55, 61, 100, 161, 171, 180, 
212-213, 237, 332, 402 
Symmetry, 54, 56, 155-157, 159, 161, 
163, 165, 167-169, 171, 173-175, 
177-179, 181, 183-185, 187-191, 
193-195, 234-235, 252, 260, 279, 
293, 395 
line of, 163 


T 

Tables, 218, 283, 306, 318, 322 

Tangent, 277, 282, 289 
defined, 277, 289 


Transformations, 109, 264, 290, 312, 
422, 433 
translations, 290 
Translations, 157, 159-160, 162, 169, 
171, 176, 189-190, 194, 290, 349 
defined, 157, 159, 194, 290 
reflection, 160, 176, 189 
vertical, 189 
Triangles, 53-54, 178-179, 183, 188, 460 
congruent, 53-54, 178, 188 
equilateral, 179, 183, 188 
right, 460 
theorem, 460 


U 
Unit circle, 43, 69, 74, 136, 259, 264— 
265, 267-268, 272, 275, 277, 316 
defined, 69, 74, 136, 264-265, 267, 
IEPA, OTS INS 
Unit vectors, 134, 136, 257, 267-268, 
270, 286 


v 
Variables, 15, 37, 231, 234, 248, 
250-251, 262, 315, 331-334, 346, 
349-353, 357, 361, 374-375, 383, 
415, 422-423, 433, 441, 445, 447, 
458, 474 
functions, 350-351, 353, 361, 374, 
383, 447 
Variation, 473, 480 
Vectors, 2-4, 9-10, 18, 23, 26, 72, 75, 
79-80, 86-102, 104-105, 107-114, 
126-127, 129, 131, 133-136, 145, 
152, 160, 162, 164, 169-172, 174, 
177, 184, 189, 193, 231-235, 
237-245, 249, 253-254, 256-258, 
261-262, 267-268, 270, 277, 282, 
286-287, 290, 293-295, 297-299, 
302, 314, 323, 330, 350-351, 385, 
409, 412, 417, 423, 425-427, 429, 
432, 439, 441, 443, 446, 464 
addition, 2, 79-80, 86-87, 100, 105, 
113, 177, 258, 270, 293, 330, 350, 
385 
defined, 2-3, 23, 26, 72, 75, 86-87, 89, 
93, 98, 104, 110, 126-127, 129, 133, 
136, 145, 152, 164, 193, 231, 233, 
235, 240, 244, 256-258, 267, 277, 
286, 290, 293-294, 302, 330, 350, 
409, 439 
direction of, 257 
dot product, 133, 135, 231, 233-235, 
237, 241-244, 262, 268, 287, 446 
equality, 96, 126, 135, 238 
linear combination of, 9, 87-88, 90-92, 
94, 99, 111, 145, 350-351, 429 
orthogonal, 133-136, 152, 160, 162, 
164, 169, 171-172, 174, 189, 233, 
237-243, 245, 249, 253-254, 256— 
258, 261-262, 270, 277, 286-287, 
294-295, 299, 314, 323 
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parallel, 112-113, 162, 164, 172 
perpendicular, 162 
scalar multiplication, 2, 79, 86-87, 91, 
96, 235, 240, 350, 417 
scalar product, 100 
unit, 9, 18, 75, 101, 134, 136, 193, 257, 
267-268, 270, 277, 286, 290, 409 
zero, 23, 26, 75, 80, 86, 88-91, 96-98, 
100-102, 105, 109-112, 126-127, 
134, 162, 164, 169, 171, 238-240, 
242, 244-245, 249, 253-254, 
277, 282, 290, 295, 299, 314, 330, 
350-351, 409, 423, 425, 427, 429, 
432, 439, 464 
Velocity, 277 
linear, 277 
Vertex, 63, 174, 180-182, 185, 188, 
191-192, 194, 200, 215, 293, 309, 
393 
Vertical, 108, 154, 165, 189, 265, 
267-268, 426-427, 469 
Vertical axis, 165, 189, 265, 268 
Viewing, 55, 264, 301 
Volume, 18, 316-317, 323 


x 
x-axis, 164, 257, 323 
x-coordinate, 458 
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Years, 4, 157, 391 
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Zero, 6-8, 11, 13-16, 21-24, 26-27, 
29-31, 44-45, 75, 80, 82, 84-86, 
88-91, 96-98, 100-102, 105, 
109-112, 115, 119, 121-124, 
126-127, 130, 132, 134, 139, 141, 
148, 157-158, 161-162, 164, 166, 
169, 171, 236, 238-240, 242, 
244-245, 248-250, 253-254, 269, 
271, 273, 275, 277-278, 281-282, 
288, 290-291, 295, 299, 311-314, 
316-317, 324, 328-332, 334-338, 
341, 343-355, 357, 359-360, 362, 
363-366, 368, 370-371, 380, 
397, 409, 411, 418-423, 425, 427, 
429-430, 432-436, 439, 442, 448, 
452, 462-464, 466-468, 473-475, 
478-480 

exponent, 121, 123, 463 
matrix, 6-8, 11, 13-16, 21-24, 

26-27, 29-31, 75, 80, 84, 88-89, 
91, 96, 100-102, 105, 109-112, 
115, 119, 121-124, 126-127, 130, 
132, 134, 139, 141, 148, 162, 
164, 236, 238-240, 242, 244-245, 
248, 250, 253-254, 269, 271, 
273, 275, 277-278, 281, 288, 
290-291, 295, 299, 311-314, 324, 
419-423, 425, 427, 429-430, 
434-435, 439, 442 
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For these special editions, the editorial team at Pearson has collaborated with educators across the 
world to address a wide range of subjects and requirements, equipping students with the best possible 
learning tools. 


This international edition preserves the cutting-edge approach and pedagogy of the original, but may 
also feature alterations, customization and adaptation from the United States version. 
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